IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Evaluation Parameters
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
11-51
measures corresponding to the presentation of Fig. 28 show results favoring
Cran-l. Rank recall and log precision appear to follow the pattern expected
of a user-oriented evaluation. However, additional theoretical work is re-
quired to establish the nature of these single number measures.
7. Techniques for Dissimilar System Comparisons and Operational Testing
Comparisons between systems of a semi-automatic nature, such as SMART,
with more conventional mechanized or manual systems, such as the Medlars
system, introduce many theoretical and practical problems. Although direct
comparisons of such dissimilar systems are almost impossible to make, one
small part of the problem concerning performance measurement can be discussed.
This relates to the ability to compare the retrieval performance of a system
that produces a ranked output, such as SMART, with a system that conventionally
uses a search term matching cutoff, retrieving unordered sets of documents
of generally uncontrollable numbers.
For experimental systems that use search term matching cut-offs, such
as the Cranfield Project which uses techniques of "coordination levels", it
is possible to obtain full precision versus recall curves if very exhaustive
search programs are used to establish many cut-off points; the resulting
curves can then be compared to the curves produced by SMART. If a direct
comparison of this sort is not possible, then an alternative is to apply to
the non-ranking system a simple random ranking technique that places relevant
documents in random positions in each of the large sets of retrieved
documents, as has been done at Cranfield.
For operational system comparisons, however, such exhaustive searching