ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
Evaluation of Document Retrieval Systems
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
5-14
The set overlap correlation function was defined in Chapter 4
as:
£ (q,d) = n(q(\d),
where n(A) is the number of elements in the set A. Thus if the query q
contains Ti keywords, all documents containing at least those same Ti
q q
keywords receive correlation Ti . Documents containing Ti -1 of the Ti
q q q
query terms receive correlatioTi Ti -1, etc. Therefore, the UTiiOfl of all
q
document subsets with correlation k or greater under the overlap
matching function is equivalent to the retrieved set R(k) under set
inclusion matching, and thus when the retrieval criterion is allowed to
vary, these matching functions are essentially equivalent.
Set represented index images lead to retrieval rankings of
document subsets, whereas with vector represented index images
individual documents are ranked. The difference is essentially one of
degree and can be attributed to the. increased information content of the
vector index language. With a vector correlation matching process the
retrieved subset may be parametrically associated with a cutoff
correlation (defined either absolutely or relatively with respect to the
correlation distribution for each `query[OCRerr], or with the[OCRerr]rank position of
documents in the ordering' induced[OCRerr]from the correlation coefficient (for
example by defining the retrieved subset to contain the k highest
* correlating documents). The common property of any of these a;lternatives
is that they all yield a sequence of[OCRerr]monotone increasing retrieved