ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
Evaluation of Document Retrieval Systems
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
5-16
the total number of retrieved documents. It has been shown above that
an increase in the size of the retrieved subset is tantamount to
relaxing the requirements for query-document matching. Thus if it is
assumed that the matching function is a statistically significant
indicator of relevance (the counter assumption is clearly contradictory
to its use), precision must decrease with increase in the size of the
retrieved subset.
As a concrete example, consider the vector indexing model.
With respect to a given query, one assumes that the probability that a
document d. be relevant to the query is a monotonic function of the
1
correlation coefficient £(q,d[OCRerr]). Consider the two highest correlating
documents d[OCRerr]i and di which result from search operations for some
1 2
ensemble of queries Let q1 be the probability that d is relevant
1
and q2 be the probability that d is relevant. The assumption above
2
implies that averages over the query ensemble will yield estimates for
these probabilities such that:
(5.i[OCRerr])
%>q2
[OCRerr]ow assume that the precision ratio is calculated after each retrieved
document (i.e. the cutoff is a function of the retrieval ordering). At
cutoff 1, the precision ratio is clearly Aq1 At cutoff 2, the
precision ratio is (q1+Aq2)/[OCRerr] Since [OCRerr]1>[OCRerr]2 implies that Aq1>(q[OCRerr]1 +[OCRerr]2 )/2,
the precision decreases as the number of documents considered retrieved,
*increases.