IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Evaluation Parameters chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. II-~ areas covered by a set of requests. No suitable method of achieving this type of comparison has yet been developed, but it is crucial to further research in this area because clearly some coliections are more hostile to a good retrieval performance simply because these contain a large number of potentially retrievable non-relevant items. Four desirable properties of retrieval performance measures are suggested by John Swets [3], namely that the measure should be: - able to measure retrieval effectiveness alone, separately from other criteria such as cost; - independent of any particular cut-off; - a single number; - on a number scale to give absolute and relative values. Swets, however, does not recognize the possibility that different purposes and measurement viewpoints may be important, and the resulting measure proposed takes no account of the user viewpoint in a directly meaningful way. From matters discussed already, several other properties appear desirable: - ability to reflect success of system in meeting needs of different types, such as high precision, or high recall; - ability to interpret measures directly in terms of a u[OCRerr]er's experience: for example, 0.2 precision at 0.5 recall means that the user has examined half the relevant documents available, while at the same time four non-relevant document items were looked at for every one relevant; - ability to compare systems of differing generality. Other properties can be suggested, but the purposes and viewpoints here suggested should override such properties as the `tsingle number" or "absolute and relative scales", which are desir[OCRerr]1e perhaps but not essen- tial. The purposes, viewpoints and properties discussed are summarized in Figure 2.