IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Document Length chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. V-3 needed for handling documents which are not available with a suitable abstract. Some larger selections from the full text of documents consisting of more material than the abstract, yet less than full text, may be possible; for example, section headings and figure captions might be added to the abstract. In the present study, several different selections of documents will be compared, the shortest being titles only, and the longest a collec- tion of full text tshortt conference papers. Evaluation of these different document lengths will center on the retrieval performance achieved. Other evaluation criteria such as search time and input cost will be of considerable importance in operational environments, but in the experimental tests being performed on the SMART system no reasonable simulation test of these criteria can yet be made. 2. SMART Test Comparisons Three series of comparisons of document length are presented. Firstly, the use of abstracts (including titles) is compared to the use of document titles alone. Results are presented for the three collections of documents being used for current experiments in the subject areas of computer science (:RE-3, 780 documents, 3[OCRerr] requests), aerodynamics (Cran-l, 200 documents, [OCRerr]2 requests), and documentation (ADi, 82 documents, 35 requests). Secondly, using the ADI Collection the abstracts are compared to the use of full text. In the main results, the text used includes the abstract, and both naturally include the title, 80 that three distinct document lengths are available for comparison. The ADI Text Collection consists of a set of short conference papers of average length 1,380 words; it is therefore not typical of scien- tific papers in general, and does not pose any problems due to non-textual material. The third comparison is made with the Cran-l abstracts which are