ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval Evaluation of Document Retrieval Systems chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. of a retrieval operation. A user [OCRerr]resents a search yequest to the system which then compares an index language representation of it to the index images of the documents in some collection. Each comparison results in a binary decis[OCRerr]on to retrieve or not retrieve the reference document. Independently 6f the system, it is assumed that the user has made a binary relevance judgment with respect to his information needs (represented by the search request) and the content of each document. The possible results of such an experiment with respect to a single reference dQcument may be re£presented by the discrete sample space shown in Figure 5[OCRerr]W (a). Assuming this sample space, estimates of the probabilities associated with each of the sample points (i.e. of the joint probability distribution of[OCRerr]the user/system decisions) can be produced by tabulating the number of occurrences of each of the possible outcomes over all of the documents or trials which comprise a single retrieval operation. This is represented by the 2-by-2 contingency table of Figure 5.1 (b), where the ratio of each of the numbers shown to the total numberof documents represents the estimate of the probability of the corresponding sample point, i.e.: n1 [OCRerr]Ret = N = rieval and [OCRerr]elevance[OCRerr] (5.1) = n2 - Fr £[OCRerr]etrieval Nonrelevance[OCRerr] <5.2) N. - and n p = - Fr [OCRerr]Nonretrieval and Relevanc[OCRerr] (5.3) 3 N -