IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Document Length
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
V-l~
A) Abstracts versus Titles
Overall performance measures are given in Figures 5, 6 and 7e
Nine comparisons of abstracts and titles are presented using the normalized
measures in Figure 5, the runs being made on different dictionaries using the
cosine numeric matching algorithm for the three different collections. Every
case shows the abstract to be superior to title, by as much as 0.0879 normalized
recall in the case of ADI Stenie Subsequent presentations will concentrate
on the stem and thesaurus dictionaries only, for these three collectionse
Precision/recall graphs using stem are given in Figure 6. Title is slightly
superior to abstract between 0.25 and 0.55 recall on the Cran-l collection,
otherwise the abstracts are always superior. Figure 7 repeats the comparison
using thesaurus dictionaries, and in this case the abstract is superior
to the title on Cran-l over the whole curve, but on ADI the title is slightly
superior to the abstract at low recall values. These graphs show that for
the IRE-3 collection, abstract is always clearly superior to title, but
on the ADI and Cran-l collection the title is sometimes as good as the abstract
in the low recall/high precision region.
Before presenting the recall ceiling data for these results, some
explanations are necessary. For purposes of the experimental tests, requests
are searched in the system and every single document in the collection is
correlated with the request and is given a rank position in the output list.
No cut-off is used to tlretrieve??, say, half the collection, since a cut-off
might be made at any level by a user when he examines the output. In a real-
life situation, it will be a rare thing for a user to examine documents that
have a very low correlatiDn with the request) and it seems certain that users
would never examine documents with zero correlation; indeed, willingness
to examine such would remove the need for the retrieval system altogether.