ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
Evaluation of Document Retrieval Systems
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
of a retrieval operation. A user [OCRerr]resents a search yequest to the
system which then compares an index language representation of it to
the index images of the documents in some collection. Each comparison
results in a binary decis[OCRerr]on to retrieve or not retrieve the reference
document. Independently 6f the system, it is assumed that the user
has made a binary relevance judgment with respect to his information
needs (represented by the search request) and the content of each
document. The possible results of such an experiment with respect to
a single reference dQcument may be re£presented by the discrete sample
space shown in Figure 5[OCRerr]W (a). Assuming this sample space, estimates
of the probabilities associated with each of the sample points (i.e.
of the joint probability distribution of[OCRerr]the user/system decisions)
can be produced by tabulating the number of occurrences of each of the
possible outcomes over all of the documents or trials which comprise a
single retrieval operation. This is represented by the 2-by-2
contingency table of Figure 5.1 (b), where the ratio of each of the
numbers shown to the total numberof documents represents the estimate
of the probability of the corresponding sample point, i.e.:
n1 [OCRerr]Ret
= N = rieval and [OCRerr]elevance[OCRerr] (5.1)
= n2 - Fr £[OCRerr]etrieval Nonrelevance[OCRerr] <5.2)
N. - and
n
p = - Fr [OCRerr]Nonretrieval and Relevanc[OCRerr] (5.3)
3 N -