ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval Evaluation of Document Retrieval Systems chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 5-12 Table 5.1 is cumulated into a single set of frequencies, the population ratio estimates for the precision and recall are as shown in Table 5.2 (b). The numbers were chosen to illustrate that the cumulation of ob- servations results in a precision estimate weighted towa-[OCRerr]1s the system1s performance for queries with higher than average number of documents retrieved, and a recall estimate weighted tow[OCRerr]:Ls the system1s performance for queries with a higher than average number of documents relev[OCRerr]nt. Thus for the example shown[OCRerr]the population ratio estimates are biased by the presence of query type 4. :--[OCRerr] Query Precision Recall Cumulative Frequencies Type __________ _________ \\J____________________________ [OCRerr] 1 .7 .7 n1 n L[OCRerr]ffiS4[OCRerr][OCRerr].n62 2 .5 .5 26 5 .[OCRerr] 1 .5 Population Rat [OCRerr]os 4 1 1 26 __ I = - 9' .55 I LL[OCRerr][OCRerr] [OCRerr] [OCRerr]c SC Sample .55 .45 r = 26 [OCRerr] .50 Means c 88 (a) Query Dependent Statistics (b) Population Dependent _________________________________________ 1. Statistics Comparison of Precision and Recall Estimates Table 5.2 C. Output Characterization The model discussed above for describing a set of retrieval operations is generally extended by allowing a parametric characteriza- tion of search output, i.e. of the system1s retrieval decisions. The