IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Word-Word Associations in Document Retrieval Systems chapter M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. IX-20 by promoting documents which were already retrieved but at a moderately low level. This is done by increasing the weight of significant words which previously matched in the request and document by adding associated words to both. This increase in the weight of the significant words (by addition of words which co-occur with them) improves performance. This second effect is in fact responsible for most of the improvement shown by the association process. The precision effect might seem to correspond to the role of the list of non-significant words in the thesaurus method, just as the recall effect.corresponds to the synonym list. In practice, however, the non- significant words needed for this purpose are often missed in thesaurus construction. For example, the thesaurus constructor may easily fail to recognize the uselessness of "addition", "hand", "order", and "example", if he does not know that they have occurred pnly in the combinations "in addition", "on the other hand", "in order to" and "for example" in the particular collection. Also, high frequency words, even if they retain their semantic sense, are often of no value for retrieval because they occur so often as to provide no discrimination between documents. Unless the thesaurus is made with the aid of a complete concordance, such errors are quite likely to' occur. The two effects of the associative process can be seen in Table 6. This shows the changes in rank position of relevant documents from the word stem matching process to the associative retrieval run, using a term frequency range of 6 to 50 and a cutoff of 0.45. The number of documents which change from each range of rank positions in the associative run is shown in the main block of the table and the net change in each rank group