ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
Evaluation of Document Retrieval Systems
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
5-11
V [OCRerr]; h(pi, ri)
i=1
[OCRerr].e. the s[OCRerr]rnpie mean, which is a :½--[OCRerr]ction of the sample distribution
of the precision and recall condi£i[OCRerr]al probabilities and not of the
population ratio e5timate$.
A numerical example may serve to illustrate the Prece[OCRerr]ing
points. Assume that a sample set of test queries produces results
which can be placed in the four categories shown in [OCRerr]able 5.1. It
is implied in this hypothetical case that each of the obse'rvatidns is
representative 0£ some large subset of input queries of the testsample,
so that it can be assumed[OCRerr] that the four query types represent equally
probable subclasses of the query sample space.
ni n2 n[OCRerr]
Query Type Relevant & 3 Nonrelevant Relevant &
Retrieved & Retrieved [OCRerr]ot Retrieved
1 7 3 3
2 5 5 5
3 [OCRerr] 1 9
4 [OCRerr]
5 45 45'
Retri[OCRerr]ev[OCRerr]al Results for 4 Equally Pr obab1e...Query.Typ[OCRerr]s
Table 5.1
Table 5.2 (a) shows the precision and recall sample distri-
butions and the sample mean estimators for the averages of these random
variables over' the query sample space. If, however, the data from