IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Word-Word Associations in Document Retrieval Systems
chapter
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
IX-20
by promoting documents which were already retrieved but at a moderately low
level. This is done by increasing the weight of significant words which
previously matched in the request and document by adding associated words
to both. This increase in the weight of the significant words (by addition
of words which co-occur with them) improves performance. This second effect
is in fact responsible for most of the improvement shown by the association
process.
The precision effect might seem to correspond to the role of the
list of non-significant words in the thesaurus method, just as the recall
effect.corresponds to the synonym list. In practice, however, the non-
significant words needed for this purpose are often missed in thesaurus
construction. For example, the thesaurus constructor may easily fail to
recognize the uselessness of "addition", "hand", "order", and "example",
if he does not know that they have occurred pnly in the combinations "in
addition", "on the other hand", "in order to" and "for example" in the
particular collection. Also, high frequency words, even if they retain
their semantic sense, are often of no value for retrieval because they
occur so often as to provide no discrimination between documents. Unless
the thesaurus is made with the aid of a complete concordance, such errors
are quite likely to' occur.
The two effects of the associative process can be seen in Table 6.
This shows the changes in rank position of relevant documents from the word
stem matching process to the associative retrieval run, using a term
frequency range of 6 to 50 and a cutoff of 0.45. The number of documents
which change from each range of rank positions in the associative run is
shown in the main block of the table and the net change in each rank group