IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Word-Word Associations in Document Retrieval Systems chapter M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. IX-50 Since the association procedure seems independent of a thesaurus procedure, one can ask which is the better method only in the sense that - considering the average collection and request - which method has a better probability of working well. Fig. 15 shows comparative recall-precision curves for thesauruses and association runs for three collections using the best association strategies. It is seen that for two of the collections, the thesaurus is definitely superior, and for the third collection (Cranfield), the difference in performance is non-significant. For the Cranfield col- lection, the thesaurus performs worse than for the other collections in general. It is believed that the reason for the poor performance of this thesaurus is that it was originally constructed for a diffE[OCRerr]rent purpose, and thus is not properly optimized for the SMART programs. If this were to perform a little better, one would expect all three collections to show equivalent curves. Even with the performance curves shown in Fig. 6, however, it is clear that on the average, requests should be entrusted to a thesaurus rather than to an association scheme for maximum performance. 5. Conclusions A survey of associative retrieval results indicates that a) on small collections, associations are not for determining word meanings or relations, since the majority of the as- sociated pairs depend on purely local meanings of the words and do not reflect their general meaning in the technical text;