ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
Evaluation of Document Retrieval Systems
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
5-12
Table 5.1 is cumulated into a single set of frequencies, the population
ratio estimates for the precision and recall are as shown in Table 5.2
(b). The numbers were chosen to illustrate that the cumulation of ob-
servations results in a precision estimate weighted towa-[OCRerr]1s the
system1s performance for queries with higher than average number of
documents retrieved, and a recall estimate weighted tow[OCRerr]:Ls the system1s
performance for queries with a higher than average number of documents
relev[OCRerr]nt. Thus for the example shown[OCRerr]the population ratio estimates
are biased by the presence of query type 4.
:--[OCRerr]
Query Precision Recall Cumulative Frequencies
Type __________ _________ \\J____________________________
[OCRerr]
1 .7 .7 n1 n
L[OCRerr]ffiS4[OCRerr][OCRerr].n62
2 .5 .5 26
5 .[OCRerr] 1 .5 Population Rat [OCRerr]os
4 1 1 26
__ I = - 9' .55
I LL[OCRerr][OCRerr] [OCRerr] [OCRerr]c SC
Sample .55 .45 r = 26 [OCRerr] .50
Means c 88
(a) Query Dependent Statistics (b) Population Dependent
_________________________________________ 1. Statistics
Comparison of Precision and Recall Estimates
Table 5.2
C. Output Characterization
The model discussed above for describing a set of retrieval
operations is generally extended by allowing a parametric characteriza-
tion of search output, i.e. of the system1s retrieval decisions. The