ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
Evaluation of Document Retrieval Systems
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
r[OCRerr]) i = 1 ,m
where
an~
i
1 _____
precision = p -
ol the [OCRerr]ih query [OCRerr] +
i
i. ______
recall = r =
of the ith query Pf + P i
3
i
ni
i i .,
[OCRerr] n
1'. 2
i
ni
i i.'
n1 +n
3.
(5.~)
(5.10)
where the n.?s are [OCRerr]efine[OCRerr] byFig[OCRerr][OCRerr]re 5.1 (b). The m couples (p[OCRerr],r[OCRerr])
provi[OCRerr]e an estimate of the [OCRerr]oint probability [OCRerr]istribution of the
ran[OCRerr]om variables P an[OCRerr] R [OCRerr]ef ine[OCRerr] by:
= [OCRerr] [OCRerr] = rk[OCRerr] = [OCRerr]PJ[OCRerr] rk) ([OCRerr], k *= 1,2,..)
The respective expectations of these ran[OCRerr]om variables [OCRerr](P) and E(R)
are estimated by the sample means:
m 1 m i
n1
=2 t [OCRerr]1 1
m [OCRerr] i i = m (5.11)
I i i
n +n
i-i [OCRerr]1 + i=1 1 2
m 1
r =2 [OCRerr] [OCRerr]1 1
m [OCRerr] i i =
i-i [OCRerr]1 + P3
n
(5.12)
Tm i
_ 1
i=1 n11+n[OCRerr];
The Cranfield data was interpreted in a differ[OCRerr]nt manner. In
particular the precision and recall estimates were computed accordi[OCRerr]
to the equations: