IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Document Length chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. V-57 position better than the relevant one; this is due to a highly weighted common term (`tinformation11) which gives a high correlation to the non- relevant document. For this request, some extra weight placed on the im- portant term [OCRerr] would preserve the perfect rank position of relevant document 09 on full text. The average performance results show that although text is superior to abstracts, the improvement is small. Since the ADI abstracts are shorter than those used in Cran-l or IRE, and probably do not include so much useful information, longer abstracts might perform better than full text. Any validation of full text searching would need to be carried out with text lengths more comparable to the average journal article or report than the short papers used, but even the use of these somewhat unsatisfactory documents suggests that text searching is feasible and worth further study. The small superiority of indexing over abstracts can be explained by two possible reasons: 1. The indexers chose some terms. from the full texts of the docu- ments that the abstractors failed to include, and some of these terms represented subject notions that were asked for in the requests. 2. By choosing nearly half the number of terms contained in the abstracts, the indexers avoided notions that are not asked for in the requests, notions which only serve to increase the matches between requests and non-relevant documents. The second case was previously illustrated by Figure 33. Several examples of the first reason have been found; for example, in three documents the terms `1bust[OCRerr][OCRerr], lf5[OCRerr]aij[OCRerr][OCRerr]gl? and "Quasi-conical (flow)" appear in the indexing but not in the abstracts, and these ideas are all demanded in requests. There are cases also where the specific subject such as "wing" and "channels"