IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Word-Word Associations in Document Retrieval Systems
chapter
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
IX-37
vector by the association process equally important as a word in the
original document). Weights somewhat below 1 are seen to be preferable,
more so for precisions than for recall purposes. Fig. 10 indicates, in
fact, that for high recall, weights above 0.5 do not cause as much loss
in performance. To sum up, then, for high precision, one should have
low weights and high cutoffs; for high recall, higher weights and lower
cutoffs are desirable. Fig. 11 indicates recall-precision curves with
high recall and high precision specifications; as expected, they cross.
It was also seen in part 3 that additional iteration of the as-
sociation process is not useful in finding synonyms, and it is also not
of great value in retrieval. Fig. 12 shows curves for 0, 1, and 2
iterations of the association procedure, with frequencies of 6-50 and
a cutoff of 0.60. The first iteration curve is seen to be superior.
The performance differences shown by the various options in
the association process are rather small. It is difficult, in parti-
cular, to choose a set of options to maximize either the precision
effect or the recall effect over an entire set of requests. Nor does
a fine adjustment of cutoff, frequency, or weight have a major effect on
retreival performance. This is just what is expected from the analysis
of the associated pairs, since no set of parameters produces an unusual
number of significant pairs. In general, the use of associated pairs
produces improvement in performance over most of the range compared with
word, stem matching if words with very low and high frequencies are omitted.
Procedures which decrease the number of associated pairs (restricting the
frequency range used, raising the cutoff) or lower the weight of the