IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Word-Word Associations in Document Retrieval Systems
chapter
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
IX-41
associations, are best for precision; procedures which increase the
number of associated pairs or their weight are best for recall, but the
effect is small.
It is often hypothesized that association procedures can simulate
the operation of a thesaurus or other classical word normalization procedures.
This can be tested on the Cranfield collection, since a thesaurus is
available as well as a form of indexing. The indexing, although very de-
tailed and exhaustive (averaging over 30 terms per abstract) is not carried
through a rigorous term normalization, and the results with it may perhaps
be unusual. It is felt, however, that in terms of overall performance,
the exhaustivity and high quality of the indexing compensates for the lack
of normalization, so that results should be roughly comparable.
As expected from the earlier discussions in this section, the
association procedure operates in a unique and virtually independent way,
simulating neither indexing nor thesaurus. Out of 42 requests, the
thesaurus improves the performance of 24, the indexing improves the perfor-
mance of 24, and the association procedure (with a frequency range of 6-100,
cutoff of .45, weight of 1.0) improves 25. Yet only 12 requests are im-
proved by all three methods (even on a random basis at least 8 would be
improved by all 3). Table 12 shows two-by-two contingency tables for co-
improvement of requests by thesaurus and association, and Table 13 shows
two-by-two contingency tables for co-improvement of requests by indexing
and association. None of the tables are significant, i.e. there is no
co-variation of association results and thesaurus results of association
results and indexing results. A set of recall-precision curves for thesaurus
and associations is shown in Fig. 13, and for indexing and association in