IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Document Length
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
V-50
The use of titles only for input to a retrieval system may be
expected to provide a widely differing performance efficiency depending on
two circumstances:
1. The degree to which titles contain specific and exhaustive
descriptions of the document content, as opposed to `novelty"
titling designed only to draw attention to the document;
2. The type of documentary need demanded by the set of requests
in use, ranging from a need which is satisfied by a total
document only (thus enabling a good title to provide a satis-
factory link), to a need which is satisfied by a small portion
often unrelated to the major subject of the document (where
titles will be quite unsatisfactory).
The first factor may be expected to differ with the subject field
and the amount of control exercised in the technical writing (technical
reports may differ from journal articles, for example). Figure 1 shows that
the Cran-l Aerodynamics titles are the longest, with IRE-3 Computer Science
titles second longest, and ADI Documentation the shortest, on average. For
example, a Cranfield title picked at random reads 11Static Longitudinal
Stability Characteristics of a blunted glider reentry configuration having
0
79.5 sweepback and [OCRerr] dihedral at a mach number of 6.2 and angles of attack
up to 2OO[OCRerr][OCRerr]. Many of the Cranfield documents are technical research reports,
whereas documents in the ADI collection are all conference `short1 papers,
and documents in the IRE collection are predominantly journal articles.
The Cranfield titles are undoubtedly the best for retrieval, thus explaining
the smallest difference that exists between title and abstract performance
on that collection.
The IRE titles are all quite short, but only a very few contain novelty
titles, such as `1A new concept in computing". The IRE requests are quite
long, and do match at least one word in most of the titles of the relevant