ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
Evaluation of Document Retrieval Systems
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
5-15
data obtained in this mariner can usually be interpret[OCRerr]ed as portraying
the variation of the joint probability distribution of the user/system
decisions as the system1s retrieval criterion is relaxed. A typical
example of this type of system characterization is gi#en by a precision
vs. recall plot such as the one shown in Figure 5.10.
Within the framework of the functional model, the result of a
retrieval operation has been characterized by two essentially different
forms. [OCRerr]hus as describedin Chapter 4, the set inclusion query-
document matching function leads to a natural partition of the reference
collection into the retrieved and not retrieved subsets. The other'
matching functions considered (correlation processes) require the
specification of a cutoff or decision criterion to induce such a
p[OCRerr]rtition. It i[OCRerr] [OCRerr]hown below that the commom means used to vary the
size of the retrieved subset under set inclusion matching is, in fact,
equivalent to the use of the set overlap correlation function.
Consider a query containing n keywords. With set inclusion
q
matching, the retrieved subset [OCRerr] contains all document images contain-
ing at least all ri query keywords. Define now a subset [OCRerr](k) which
q
contains all documents that include at least k of the n `keywords of
q
the query. A uniformly decreasing sequence of values for k from ri to
q
1 produces a sequence of retrieved subsets satisfying:
R(n ) C R(n - 1) C *.. C [OCRerr](2) C R{1)
q- q
The retrieved subset [OCRerr](k) is thus monotonically increasing with
decreasing' values of the cutoff parameter k.