ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval Evaluation of Document Retrieval Systems chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 5-16 the total number of retrieved documents. It has been shown above that an increase in the size of the retrieved subset is tantamount to relaxing the requirements for query-document matching. Thus if it is assumed that the matching function is a statistically significant indicator of relevance (the counter assumption is clearly contradictory to its use), precision must decrease with increase in the size of the retrieved subset. As a concrete example, consider the vector indexing model. With respect to a given query, one assumes that the probability that a document d. be relevant to the query is a monotonic function of the 1 correlation coefficient £(q,d[OCRerr]). Consider the two highest correlating documents d[OCRerr]i and di which result from search operations for some 1 2 ensemble of queries Let q1 be the probability that d is relevant 1 and q2 be the probability that d is relevant. The assumption above 2 implies that averages over the query ensemble will yield estimates for these probabilities such that: (5.i[OCRerr]) %>q2 [OCRerr]ow assume that the precision ratio is calculated after each retrieved document (i.e. the cutoff is a function of the retrieval ordering). At cutoff 1, the precision ratio is clearly Aq1 At cutoff 2, the precision ratio is (q1+Aq2)/[OCRerr] Since [OCRerr]1>[OCRerr]2 implies that Aq1>(q[OCRerr]1 +[OCRerr]2 )/2, the precision decreases as the number of documents considered retrieved, *increases.