IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Correlation Measures chapter K. Reitsma J. Sagalyn Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. IV-5 the elements are denoted by a second subscript, i.e. for term i , the vector ci = (c11,c12,... ,c[OCRerr]k) I k C. C for all i,j (3) [OCRerr]k-jk cc for all i,j ([OCRerr]) [OCRerr]i[OCRerr]j I k The first summation gives the number of documents having both terms i and j . The second summation gives the number of terms that documents i and j have in coimnon. It is identical to expression (i) discussed previously. 3. The Correlation Coefficients This section contains an analysis of the various correlation coef- ficients considered in this study. Each is analyzed according to its origin, initial interpretation, modifications made and final interpretation as a docu- ment - document correlation coefficient. It must be noted that there is a basic difference between the document description vector and the request description vector. The former is taken from an abstract of the article which may consist of several sen- tences. The latter is taken from a very short request. In the 82 document ADI collection, the maximum number of concepts in one description vector is [OCRerr]4, the maximum weight found in 96 Among the 35 requests the. maximum number of concepts in one description vector is 11, the maximum weight found is [OCRerr]8. Actually, most of the weights in the request vectors are 12. It is seen therefore that the document description space is not the same as the request description space. This must be kept in mind when analyzing the