Scientific Report No. IRS-13 Information Storage and Retrieval

IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Document Length chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. v-~8 Figure 33 a) and b), where it is seen that for a similar recall ratio the abstracts retrieve many more non-relevant documeuts than the indexing. These results suggest that the abstracts are too long (too exhaustive), for the particular requests and environment of this test, compared to the indexing. ijowever, the abstract searches are not so inferior to the indexing that the use of abstracts could not be considered for an operational system; indeed to a system manager the loss of performance due to use of abstracts might be lessened by use of some selective effort at the input stage, thus resul- ting in a very acceptable substitute for the effort of complete manual indexing. 5. Individual Requests and Discussion of Results Some data on individual requests is presented in order to support and illustrate the average results already given. Comparing abstracts and titles, Figure 3[OCRerr] gives results for four cases, each case corresponding to a different request/relevant document pair, using results of the Cran-l collection. Case A gives an example where the abstract provides two more matching terms and a better retrieval perfor- mance than the title, but in case B the greater match achieved by the abstract results in a worse performance for the abstract compared with the title. The latter case may be explained by remembering that the use of abstracts provides on average more matching terms between the requests and many of the documents in the collection; the reason for the case B result is that many non-relevant documents achieve better improvements in matching on abstracts compared with titles than the relevant document number 713 in question. In cases C and D, the abstract searches do not provide additional matching concepts, although the weights are increased on abstracts. In case C the abstract provides superior retrieval to the title, and in Case D the opposite result is seen to hold.