IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Evaluation Parameters chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 11-51 measures corresponding to the presentation of Fig. 28 show results favoring Cran-l. Rank recall and log precision appear to follow the pattern expected of a user-oriented evaluation. However, additional theoretical work is re- quired to establish the nature of these single number measures. 7. Techniques for Dissimilar System Comparisons and Operational Testing Comparisons between systems of a semi-automatic nature, such as SMART, with more conventional mechanized or manual systems, such as the Medlars system, introduce many theoretical and practical problems. Although direct comparisons of such dissimilar systems are almost impossible to make, one small part of the problem concerning performance measurement can be discussed. This relates to the ability to compare the retrieval performance of a system that produces a ranked output, such as SMART, with a system that conventionally uses a search term matching cutoff, retrieving unordered sets of documents of generally uncontrollable numbers. For experimental systems that use search term matching cut-offs, such as the Cranfield Project which uses techniques of "coordination levels", it is possible to obtain full precision versus recall curves if very exhaustive search programs are used to establish many cut-off points; the resulting curves can then be compared to the curves produced by SMART. If a direct comparison of this sort is not possible, then an alternative is to apply to the non-ranking system a simple random ranking technique that places relevant documents in random positions in each of the large sets of retrieved documents, as has been done at Cranfield. For operational system comparisons, however, such exhaustive searching