IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Word-Word Associations in Document Retrieval Systems
chapter
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
IX-17
that should be connected in a thesaurus. It can be used, however, to
point to word relations not normally apparent, and thus it serves as an
aid to dictionary constructors who are working with a known collection.
It should be noted again that these experiments were run on a
collection of 40,000 words. It may well be that in larger collections,
the apparent meanings of words approximate their common meanings more
closely. This point will be the subject of future investigation, but the
presence of apparently meaningless correlations has already been noted by
work[OCRerr]rs with much larger collections. [1]
The properties of second-order associations were also investigated.
These are word pairs, which need not co-occur in any documents, but must
have common first-order associations. Almost all second-order associations,
however, were also found to be first-order associated terms. They generally
arise from large blocks of words, all of which were used to discuss some
subject, and all of which were first-order associations of each other.
For example, the set of words "height", "atmosphere", "density", "km",
etc. are all used in a set of documents about the measurement of the
density of the upper atmosphere. They were all identified as first-order
association, and all became second-order associations. Stylistic quirks
were not eliminated by the repetition of the correlation process; and the
total number of associations was greatly diminished by a factor of 8-10.
Second-order associations did not produce useful synonyms; even the one or
two useful synonyms in the first-order associations (e.g. "error", as in
"error function", and "erfc", its abbreviation) tended to disappear in
second-order, as did most other associations. The use of second-order