IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Document Length chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. compared to the manual indexing available for that collection. This comparison is made because the indexing takes up about half the length of the abstracts, and constitutes a valid comparison because of the unusual nature of the indexing, which is [OCRerr] a base list of words, selected directly from the title and text of a document ... presented without any reference whatsoever to a control list for synonyms, related terms, etc.'[OCRerr] [1, page [OCRerr]l, see also pages [OCRerr]8, 52]. The controls used in indexing permitted the confounding of singular and plural word forms, as well as variant spellings, but the index terms were otherwise culled from the documents in natural language. The indexing used is then, in effect, another abstract of the documents, shorter in length than the author abstract, and produced by trained indexers It is expected that the choice of subject ideas from the whole document by the indexers will be very similar on average to the choice of ideas made by the abstractors, although the area of overlap has not been determined. Retrieval runs of the above comparisons are presented using the stem and thesaurus dictionaries and all results use the cosine correlation and numeric vectors, unless otherwise stated. The comparative lengths of the documents in these comparisons are given in Figure 1. Although the lengths given in the figure are based on the concepts resulting from the documents being looked-up in the suffix is t dictionary, relative lengths will remain the same using the stem and thesaurus dictionaries. 3. Effect of Changes in Document Length In this part, the effect of changes in document length on the match between requests and documents is considered, followed by the expected differ- ences in retrieval performance.