IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Evaluation Parameters chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 11-2 Although generality tends to vary between requests, an average value for a set of requests serves to characterize a particular series of experiments. A second purpose of performance measurement is that of making `external' comparisons between results obtained in different situations, in which generality is expected to differ. Such comparisons may be made even within an experimental test environment, if different request sets or collection sizes are introduced and compared. A third purpose that may be distinguished is a specific need to interpret experimental results in terms of expected real-life merit, rather than merely comparing different techniques in a laboratory. Experimental tests of the kind conducted by SMART are simulation-tests, and any con- clusions drawn from the results may need to be presented in a way that would be typical of the performance if the system were being used operationally. The choice of performance measures is also affected by viewpoint, either the viewpoint of the user, or of a researcher seeking fundamental insight into retrieval capability. User satisfaction is restricted to properties 11a", "b", and "c" in Figure 1, since a user is interested in examining as few non-relevant items as possible, and as many relevant items as he wishes to see, but he is not concerned about "d", or about the total collection size. From a system efficiency viewpoint, which is of concern in some types of research, the value of "d", and the coliection size are needed. For example, test comparisons between situations of differing generality require measures that include "d" if a strict comparison of efficiency is the object. Still more sophisticated techniques may be needed, since correct system efficiency comparisons require adjustment for differing concentrations of documents by subject in different collections, so that the actual collection size can be replaced by the real number of documents within the subject