ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval Evaluation of Document Retrieval Systems chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 5-15 data obtained in this mariner can usually be interpret[OCRerr]ed as portraying the variation of the joint probability distribution of the user/system decisions as the system1s retrieval criterion is relaxed. A typical example of this type of system characterization is gi#en by a precision vs. recall plot such as the one shown in Figure 5.10. Within the framework of the functional model, the result of a retrieval operation has been characterized by two essentially different forms. [OCRerr]hus as describedin Chapter 4, the set inclusion query- document matching function leads to a natural partition of the reference collection into the retrieved and not retrieved subsets. The other' matching functions considered (correlation processes) require the specification of a cutoff or decision criterion to induce such a p[OCRerr]rtition. It i[OCRerr] [OCRerr]hown below that the commom means used to vary the size of the retrieved subset under set inclusion matching is, in fact, equivalent to the use of the set overlap correlation function. Consider a query containing n keywords. With set inclusion q matching, the retrieved subset [OCRerr] contains all document images contain- ing at least all ri query keywords. Define now a subset [OCRerr](k) which q contains all documents that include at least k of the n `keywords of q the query. A uniformly decreasing sequence of values for k from ri to q 1 produces a sequence of retrieved subsets satisfying: R(n ) C R(n - 1) C *.. C [OCRerr](2) C R{1) q- q The retrieved subset [OCRerr](k) is thus monotonically increasing with decreasing' values of the cutoff parameter k.