Scientific Report No. IRS-13 Information Storage and Retrieval

IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Document Length chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. V-l~ A) Abstracts versus Titles Overall performance measures are given in Figures 5, 6 and 7e Nine comparisons of abstracts and titles are presented using the normalized measures in Figure 5, the runs being made on different dictionaries using the cosine numeric matching algorithm for the three different collections. Every case shows the abstract to be superior to title, by as much as 0.0879 normalized recall in the case of ADI Stenie Subsequent presentations will concentrate on the stem and thesaurus dictionaries only, for these three collectionse Precision/recall graphs using stem are given in Figure 6. Title is slightly superior to abstract between 0.25 and 0.55 recall on the Cran-l collection, otherwise the abstracts are always superior. Figure 7 repeats the comparison using thesaurus dictionaries, and in this case the abstract is superior to the title on Cran-l over the whole curve, but on ADI the title is slightly superior to the abstract at low recall values. These graphs show that for the IRE-3 collection, abstract is always clearly superior to title, but on the ADI and Cran-l collection the title is sometimes as good as the abstract in the low recall/high precision region. Before presenting the recall ceiling data for these results, some explanations are necessary. For purposes of the experimental tests, requests are searched in the system and every single document in the collection is correlated with the request and is given a rank position in the output list. No cut-off is used to tlretrieve??, say, half the collection, since a cut-off might be made at any level by a user when he examines the output. In a real- life situation, it will be a rare thing for a user to examine documents that have a very low correlatiDn with the request) and it seems certain that users would never examine documents with zero correlation; indeed, willingness to examine such would remove the need for the retrieval system altogether.