IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Document Length
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
v-~8
Figure 33 a) and b), where it is seen that for a similar recall ratio the
abstracts retrieve many more non-relevant documeuts than the indexing.
These results suggest that the abstracts are too long (too exhaustive), for
the particular requests and environment of this test, compared to the indexing.
ijowever, the abstract searches are not so inferior to the indexing that the
use of abstracts could not be considered for an operational system; indeed
to a system manager the loss of performance due to use of abstracts might
be lessened by use of some selective effort at the input stage, thus resul-
ting in a very acceptable substitute for the effort of complete manual indexing.
5. Individual Requests and Discussion of Results
Some data on individual requests is presented in order to support
and illustrate the average results already given.
Comparing abstracts and titles, Figure 3[OCRerr] gives results for four
cases, each case corresponding to a different request/relevant document
pair, using results of the Cran-l collection. Case A gives an example where
the abstract provides two more matching terms and a better retrieval perfor-
mance than the title, but in case B the greater match achieved by the abstract
results in a worse performance for the abstract compared with the title.
The latter case may be explained by remembering that the use of abstracts
provides on average more matching terms between the requests and many of the
documents in the collection; the reason for the case B result is that many
non-relevant documents achieve better improvements in matching on abstracts
compared with titles than the relevant document number 713 in
question. In cases C and D, the abstract searches do not provide additional
matching concepts, although the weights are increased on abstracts. In
case C the abstract provides superior retrieval to the title, and in Case
D the opposite result is seen to hold.