IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Correlation Measures chapter K. Reitsma J. Sagalyn Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. Iv-' Iv. Correlation Mea[OCRerr]ures K. Reitsma and J. Sagalyn Abstract In this study the performance of ten matching functions is inves- tigated. The performance is measured in terms of recall and precision. All ten functions are tested on the 82 document ADI collection; the best four are tested again on the larger 200 document Cranfield collection. It is shown that the Parker-Rhodes-Needham function has the best performance in the ADI collection below 0.50 recall; however, this function is the worst in the Cranfield collection test. Overall, the Cosine function shows the best performance. 1. Introduction A document retrieval system, from a user's point of view, takes a request for information, in the form of a short verbal description, matches the request against the documents in the collection and returns those which by some measure are most relevant. Within the SMART system, all the documents have been analyzed auto- matically according to word frequency counts of keywords contained in a thesaurus. Each analyzed document is represented by a description vector of concept numbers with corresponding weights (the weight being proportional to the frequency of occurence of that concept). When a request is received, it