IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Word-Word Associations in Document Retrieval Systems
chapter
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
IX-50
Since the association procedure seems independent of a thesaurus
procedure, one can ask which is the better method only in the sense that -
considering the average collection and request - which method has a better
probability of working well. Fig. 15 shows comparative recall-precision
curves for thesauruses and association runs for three collections using
the best association strategies. It is seen that for two of the collections,
the thesaurus is definitely superior, and for the third collection (Cranfield),
the difference in performance is non-significant. For the Cranfield col-
lection, the thesaurus performs worse than for the other collections in
general. It is believed that the reason for the poor performance of this
thesaurus is that it was originally constructed for a diffE[OCRerr]rent purpose,
and thus is not properly optimized for the SMART programs. If this
were to perform a little better, one would expect all three collections
to show equivalent curves. Even with the performance curves shown in
Fig. 6, however, it is clear that on the average, requests should be
entrusted to a thesaurus rather than to an association scheme for maximum
performance.
5. Conclusions
A survey of associative retrieval results indicates that
a) on small collections, associations are not for determining
word meanings or relations, since the majority of the as-
sociated pairs depend on purely local meanings of the words
and do not reflect their general meaning in the technical text;