Scientific Report No. IRS-13 Information Storage and Retrieval

IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Document Length chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. V-13 performance. It is quite clear that where high recall is required, long documents are needed, since short documents, or low exhaustivity, constitute an absolute bar on the recall attainable; this 11recall ceiling" is one of several important criteria for evaluating changes in document length. The opposite of this statement does not follow automatically, since it is not necessarily true that for high precision requirements short documents are needed. For a requirement of highest precision at low recall, some optimum document length normally exists in a given environment, and tests presented on SMART will give some idea of this optimum length for the different test collections used. [OCRerr]. Test Results Test results which consist of retrieval performance comparisons are given first for abstracts versus titles, then for abstracts versus full text, and finally for abstracts versus indexing. In each of these sub-sections, performance comparisons will be made using three main techniques: - Overall performance measures, consisting of normalized recall values, normalized, precision values, and precision/recall graphs; - Recall ceiling data, using recall alone; - Individual request and relevant document data, using tables and graphs of the numbers of requests and documents that favor a given option. After the main test results for each sub-section have been presented, additional test results of value are also described. All results are averages over the set of requests being tested, as indicated in the figures.