IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Word-Word Associations in Document Retrieval Systems chapter M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. IX-27 significance of these pairs; the thesaurus does not connect "Navier- Stokes" with any of these terms. As a result of these associations, this relevant document is promoted from rank position 143 to rank position 4. The results of retrieval experiments can be used to determine the best set of parameters for the association process. The conclusions agree well with those deduced from the examination of the pairs in part 3. It is noted there, for example, that words that are either very frequent or very rare tend to have non-significant associations. The fraction of meaningful correlations can also be increased by raising the cutoff. The effect of this on retrieval is shown in Fig. 5, where recall-precision curves for the stem dictionary directly - without any associations added - and for two different association strategies, are compared. When all words, of whatever frequency, are used in the association process, the resulting curve is usually inferior to the normal word matching run. But when the frequencies of words employed in the association process are restricted to the range 6-50, and the cutoff is raised, the resulting recall-precision curve is everywhere superior to the stem curve. It is also noted in part 3 that words occurring only three or four times have fewer significant occurrences than words of six or more occurrences. [OCRerr]he effect on retrieval of variations in the frequencies of words used in the association process is shown in greater detail in Table 9. For both recall and precision purposes, the optimum frequency range appears to be 6-50, although the differences in performance are small. Examination of the recall-precision curves of Figs. 6 and 7 shows the frequently crossing curves, and thus the insensitivity to