IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Correlation Measures chapter K. Reitsma J. Sagalyn Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. IV-2 too is analyzed in the same manner as the documents and is represented by a description vector of concept numbers and corresponding weights. Within the system, request-document comparisons are made using a mathematical correlation coefficient. Each document is compared with the request by calculating the magnitude of the coefficient. The documents are then ranked according to the coefficient and hopefully according to the degree of relevancy with the request. The subject of this study is to evaluate several to determine which one is the [OCRerr]wbestvl to use. The the one for which the largeBt number of relevant the top of the ranked document list. Some work has been done with various correlation coefficient:. In 1966, Manning and Hall analyzed several correlation coefficients and proposed two of their own, however, they did. not present any conclusive evidence for an evaluation. It is the aim of this study to evaluate several of the coefficients previously used by Manning and Hall, including one which they proposed, as well as a few others which have been derived from other types of coefficients. The initial evaluation has been done on the ADI collection of 82 documents and 35 requests. Since this collection is small, any con- clusion must be verified on a larger collection, such as the 200 document Cranfield collection. ficients should be found on correlation coef- Ilbest?? coefficient documents are 2. Weighted versus [OCRerr]gical Description Vectors A document description vector can take on two forms. One is a logical or binary vector in which every element is either 0 or 1 . Each position in the vector represents a concept (e.g. the first position represents