ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval Evaluation of Document Retrieval Systems chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 5-14 The set overlap correlation function was defined in Chapter 4 as: £ (q,d) = n(q(\d), where n(A) is the number of elements in the set A. Thus if the query q contains Ti keywords, all documents containing at least those same Ti q q keywords receive correlation Ti . Documents containing Ti -1 of the Ti q q q query terms receive correlatioTi Ti -1, etc. Therefore, the UTiiOfl of all q document subsets with correlation k or greater under the overlap matching function is equivalent to the retrieved set R(k) under set inclusion matching, and thus when the retrieval criterion is allowed to vary, these matching functions are essentially equivalent. Set represented index images lead to retrieval rankings of document subsets, whereas with vector represented index images individual documents are ranked. The difference is essentially one of degree and can be attributed to the. increased information content of the vector index language. With a vector correlation matching process the retrieved subset may be parametrically associated with a cutoff correlation (defined either absolutely or relatively with respect to the correlation distribution for each `query[OCRerr], or with the[OCRerr]rank position of documents in the ordering' induced[OCRerr]from the correlation coefficient (for example by defining the retrieved subset to contain the k highest * correlating documents). The common property of any of these a;lternatives is that they all yield a sequence of[OCRerr]monotone increasing retrieved