IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Document Length
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
V-57
position better than the relevant one; this is due to a highly weighted
common term (`tinformation11) which gives a high correlation to the non-
relevant document. For this request, some extra weight placed on the im-
portant term [OCRerr] would preserve the perfect rank position of relevant
document 09 on full text.
The average performance results show that although text is superior
to abstracts, the improvement is small. Since the ADI abstracts are shorter
than those used in Cran-l or IRE, and probably do not include so much useful
information, longer abstracts might perform better than full text. Any
validation of full text searching would need to be carried out with text lengths
more comparable to the average journal article or report than the short papers
used, but even the use of these somewhat unsatisfactory documents suggests
that text searching is feasible and worth further study.
The small superiority of indexing over abstracts can be explained
by two possible reasons:
1. The indexers chose some terms. from the full texts of the docu-
ments that the abstractors failed to include, and some of these
terms represented subject notions that were asked for in the
requests.
2. By choosing nearly half the number of terms contained in the
abstracts, the indexers avoided notions that are not asked for
in the requests, notions which only serve to increase the matches
between requests and non-relevant documents.
The second case was previously illustrated by Figure 33. Several
examples of the first reason have been found; for example, in three documents
the terms `1bust[OCRerr][OCRerr], lf5[OCRerr]aij[OCRerr][OCRerr]gl? and "Quasi-conical (flow)" appear in the indexing
but not in the abstracts, and these ideas are all demanded in requests.
There are cases also where the specific subject such as "wing" and "channels"