IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Document Length
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
V-2~
on titles. It can be seen that 28 of the documents superior on abstracts
have improved rank positions (compared with titles) by 76 to 150 places,
thus [OCRerr]xplaining why many of the requests do work better on abstracts. Since
a lsrge number of documents exhibit quite significant improvements in rank
on titles compared with abstracts, however, the results that show superiority
of titles in the middle of the precision recall curve seem quite reasonable.
The results presented so far have all been based on the cosine cor-
relation and numeric vector matching procedure, which is generally superior to
simpler procedures. Results are given in Figures 13 to 16 based on unweighted
vectors (logical) using the overlap correlation, comparing titles and abstracts
with the stem, Cranfield collection. For this process, the title match is
superior by a small amount at the high precision end of the curve, below
0.65 recall, and this result is also reflected in the normalized measures
(Figure 13). This same precision superiority is seen in the number of requests
favoring abstracts and titles in Figure 14, where using normalized recall,
the abstracts are superior, but using normalized precision the titles do
better. The difference curve also given in Figure l[OCRerr] shows that using
normalized precision all but 2 of the 24 requests performing better with
titles do so by a greater difference than the 18 which are better on the
abstracts. Figures 15 and 16 give data for the 198 individual relevant
documents involved, `showing that 13 relevant documents changed rank by over
100 places in favor of' abstracts, but that more documents changed a smaller
number of places in favor of titles than abstracts.
These figures are presented in order to show that there is no incon-
sistency betwee[OCRerr] the results on abstracts and titles obtained with SMART and
thoseobtained on the Cranfield Project (2). The results of searches on the
same titles and abstracts, using the same collection, requests and relevance