ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval Evaluation of Document Retrieval Systems chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 5-7 for some larger collection from which [OCRerr] is a representative random sample. Consider now the typical evaluation situation in which a number of retrieval operations are performed on some sample set of search requests. In the conceptual framework of the idealized experi- ment, one assumes the existence of a universe of queries from which the sample set is drawn at random. According to the above character- ization, each retrieval operation results in a particular estimate for the joint probability distribution of the user/system decisions, applicable to the input query. Let the query sample contain' m elements. The results'of the m retrieval operations may be summarized by m[OCRerr]4-tuples: i [OCRerr] p12[OCRerr] P3[OCRerr] [OCRerr]4i) i 1 ,m where the [OCRerr]k"5 are defined by equations (5.1) to (5.4); Each of the 4-tuples in addition to defining the probabilities of the' sample points of Fignre 5.1 (a), defines a set of conditional probabilities such as are given by equations (5.5) to (5.8). In terms of the probabilisti[OCRerr] model, the behav[OCRerr]ur of a retrieval system is completely specified by the 4-tuple [OCRerr] of each query (and associated user relevance decisions) in the. universe of queries or query sample space of the syst'em, (which for convience is assumed to be discrete).' This sample space defines a joint probability distribution of four random `variables [OCRerr]1' [OCRerr]2' `P3[OCRerr] and P given by: 4