IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Evaluation Parameters chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 11-52 is rarely possible, and tests of such Systems usually produce just one precision recall pair, or at the most, three or four quite closely posi- tioned pairs. In such cases a comparison may be made by making the SMART results fit in with those of the non-ranking system by choosing cut-offs in SMART searches that are in some way identical to the cut-offs made in the non-ranking system. In a quite simple test comparison, for example, the 35 ADI requests were hand searched in a KWIC type concordance of the ADI Abstracts collection, and the result compared with the SMART Abstracts Thesaurus retrieval run (see Section X). The hand searches were based on four or five keywords for each request, and the final performance of what was intended to be a medium-precision at medium-recall search was 0.22 pre- cision at 0.72 recall. Comparison with SMART requires an examination of each individual hand searched request to see how many documents were retrieved, followed by the generation of a cut-off in the SMART ranked output at an identical point to obtain one comparable precision recall pair. The SMART result produced 0.16 precision at 0.64 recall: naturally the hand search benefited from the free choice that was allowed of any synonyms known to the searcher, and higher recall in the hand search would have required choices of further keywords. SMART's fully ranked output would allow high precision at low recall (o.31. precision, 0.31 recall, cut-off 4 documents), or high recall (0.84 recall 0.11 precision, cut-off 33 documents) simply by examining more or less of the output. Techniques of this type will be used in future comparisons of SMART and Medlars searches using a common set of documents and requests.