IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Document Length
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
V-13
performance. It is quite clear that where high recall is required, long
documents are needed, since short documents, or low exhaustivity, constitute
an absolute bar on the recall attainable; this 11recall ceiling" is one of
several important criteria for evaluating changes in document length. The
opposite of this statement does not follow automatically, since it is not
necessarily true that for high precision requirements short documents are
needed. For a requirement of highest precision at low recall, some optimum
document length normally exists in a given environment, and tests presented
on SMART will give some idea of this optimum length for the different test
collections used.
[OCRerr]. Test Results
Test results which consist of retrieval performance comparisons are
given first for abstracts versus titles, then for abstracts versus full text,
and finally for abstracts versus indexing. In each of these sub-sections,
performance comparisons will be made using three main techniques:
- Overall performance measures, consisting of normalized recall
values, normalized, precision values, and precision/recall graphs;
- Recall ceiling data, using recall alone;
- Individual request and relevant document data, using tables
and graphs of the numbers of requests and documents that favor
a given option.
After the main test results for each sub-section have been presented,
additional test results of value are also described. All results are averages
over the set of requests being tested, as indicated in the figures.