ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
Evaluation of Document Retrieval Systems
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
5-7
for some larger collection from which [OCRerr] is a representative random
sample.
Consider now the typical evaluation situation in which a
number of retrieval operations are performed on some sample set of
search requests. In the conceptual framework of the idealized experi-
ment, one assumes the existence of a universe of queries from which
the sample set is drawn at random. According to the above character-
ization, each retrieval operation results in a particular estimate for
the joint probability distribution of the user/system decisions,
applicable to the input query. Let the query sample contain' m
elements. The results'of the m retrieval operations may be summarized
by m[OCRerr]4-tuples:
i
[OCRerr] p12[OCRerr] P3[OCRerr] [OCRerr]4i) i 1 ,m
where the [OCRerr]k"5 are defined by equations (5.1) to (5.4); Each of the
4-tuples in addition to defining the probabilities of the' sample
points of Fignre 5.1 (a), defines a set of conditional probabilities
such as are given by equations (5.5) to (5.8).
In terms of the probabilisti[OCRerr] model, the behav[OCRerr]ur of a
retrieval system is completely specified by the 4-tuple [OCRerr]
of each query (and associated user relevance decisions) in the.
universe of queries or query sample space of the syst'em, (which for
convience is assumed to be discrete).' This sample space defines a
joint probability distribution of four random `variables [OCRerr]1' [OCRerr]2' `P3[OCRerr]
and P given by:
4