IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Document Length chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. V-59 (Computer Science), with the stem dictionary on the ADI Collection (Documentation), and with both stem and thesaurus dictionaries emplo[OCRerr]ing weighting and the cosine correlation on the Cran-l Col- lection (Aerodynamics). Titles perform better than abstracts on ADI using the thesaurus, which is probably due to poor abstracts rather than good titles. Titles also perform well on Cran-l when simple matching (overlap correlation) and no weights (Logical Vectors) are used; this is due to the very good length and quality of titling in aerodynamics. c) The use of abstracts in the ADI collection was only slightly inferior to full text at high precision using the stem diction([OCRerr]y, and at high recall using the stem and thesaurus dictionaries. It is suggested that the increase in recall/precision performance and increase in recall ceiling from 0.92 to 1.00 is unlikely to be worth the increased input and storage costs, and extended search time, and the use of slightly longer abstracts would show the text to have no advantages at all. Further work on full text processing of a more typical set of technical documents in another subject area is required. d) The use of abstracts in the Cran-l Collection gave a somewhat inferior performance to the shorter precis made by the manual indexers on the Cranfield Project. F'urther work is required to determine whether the appare[OCRerr]tly good quality abstracts suffer either from excessive length or failure to include some vital subject notions that the indexers included. The abstract performance is, however, sufficiently good to question the need for indexing for high perfor- mance, particularly since the indexing was more exhaustive than is practiced in many operational situations.