IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Document Length chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. V-50 The use of titles only for input to a retrieval system may be expected to provide a widely differing performance efficiency depending on two circumstances: 1. The degree to which titles contain specific and exhaustive descriptions of the document content, as opposed to `novelty" titling designed only to draw attention to the document; 2. The type of documentary need demanded by the set of requests in use, ranging from a need which is satisfied by a total document only (thus enabling a good title to provide a satis- factory link), to a need which is satisfied by a small portion often unrelated to the major subject of the document (where titles will be quite unsatisfactory). The first factor may be expected to differ with the subject field and the amount of control exercised in the technical writing (technical reports may differ from journal articles, for example). Figure 1 shows that the Cran-l Aerodynamics titles are the longest, with IRE-3 Computer Science titles second longest, and ADI Documentation the shortest, on average. For example, a Cranfield title picked at random reads 11Static Longitudinal Stability Characteristics of a blunted glider reentry configuration having 0 79.5 sweepback and [OCRerr] dihedral at a mach number of 6.2 and angles of attack up to 2OO[OCRerr][OCRerr]. Many of the Cranfield documents are technical research reports, whereas documents in the ADI collection are all conference `short1 papers, and documents in the IRE collection are predominantly journal articles. The Cranfield titles are undoubtedly the best for retrieval, thus explaining the smallest difference that exists between title and abstract performance on that collection. The IRE titles are all quite short, but only a very few contain novelty titles, such as `1A new concept in computing". The IRE requests are quite long, and do match at least one word in most of the titles of the relevant