IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Word-Word Associations in Document Retrieval Systems chapter M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. IX-41 associations, are best for precision; procedures which increase the number of associated pairs or their weight are best for recall, but the effect is small. It is often hypothesized that association procedures can simulate the operation of a thesaurus or other classical word normalization procedures. This can be tested on the Cranfield collection, since a thesaurus is available as well as a form of indexing. The indexing, although very de- tailed and exhaustive (averaging over 30 terms per abstract) is not carried through a rigorous term normalization, and the results with it may perhaps be unusual. It is felt, however, that in terms of overall performance, the exhaustivity and high quality of the indexing compensates for the lack of normalization, so that results should be roughly comparable. As expected from the earlier discussions in this section, the association procedure operates in a unique and virtually independent way, simulating neither indexing nor thesaurus. Out of 42 requests, the thesaurus improves the performance of 24, the indexing improves the perfor- mance of 24, and the association procedure (with a frequency range of 6-100, cutoff of .45, weight of 1.0) improves 25. Yet only 12 requests are im- proved by all three methods (even on a random basis at least 8 would be improved by all 3). Table 12 shows two-by-two contingency tables for co- improvement of requests by thesaurus and association, and Table 13 shows two-by-two contingency tables for co-improvement of requests by indexing and association. None of the tables are significant, i.e. there is no co-variation of association results and thesaurus results of association results and indexing results. A set of recall-precision curves for thesaurus and associations is shown in Fig. 13, and for indexing and association in