IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Evaluation Parameters
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
11-52
is rarely possible, and tests of such Systems usually produce just one
precision recall pair, or at the most, three or four quite closely posi-
tioned pairs. In such cases a comparison may be made by making the SMART
results fit in with those of the non-ranking system by choosing cut-offs
in SMART searches that are in some way identical to the cut-offs made in
the non-ranking system. In a quite simple test comparison, for example, the
35 ADI requests were hand searched in a KWIC type concordance of the ADI
Abstracts collection, and the result compared with the SMART Abstracts
Thesaurus retrieval run (see Section X). The hand searches were based on
four or five keywords for each request, and the final performance of what
was intended to be a medium-precision at medium-recall search was 0.22 pre-
cision at 0.72 recall. Comparison with SMART requires an examination of each
individual hand searched request to see how many documents were retrieved,
followed by the generation of a cut-off in the SMART ranked output at an
identical point to obtain one comparable precision recall pair. The SMART
result produced 0.16 precision at 0.64 recall: naturally the hand search
benefited from the free choice that was allowed of any synonyms known to
the searcher, and higher recall in the hand search would have required choices
of further keywords. SMART's fully ranked output would allow high precision
at low recall (o.31. precision, 0.31 recall, cut-off 4 documents), or high
recall (0.84 recall 0.11 precision, cut-off 33 documents) simply by examining
more or less of the output. Techniques of this type will be used in future
comparisons of SMART and Medlars searches using a common set of documents
and requests.