IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Word-Word Associations in Document Retrieval Systems chapter M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. Ix-26 which are too frequent in this. collection to be of much use as search terms. However, none of those words had any related pairs, while "dis- sociated" introduces eight new words. As a result, while "dissociated" represented only 8% of the original query, it and its associations re- presented 28% of the new query. The additional weight given to this important term (since all its associations are also introduced into any document [OCRerr]hich contains the word) causes three documents in rank positions 21, 23, and 27 to be promoted to positions 1, 6, and 7. Note that "dis- sociation" already appears in these documents before expansion; but it is not emphasized enough. Recall-effect improvement (introducing new terms missed in the original search) is illustrated by a question in the ADI collection, QB2, on the "testing of automatic information systems." This fails to match one relevant document which deals with the "evaluation of documentation techniques". The association procedure connects "automated" in the query with "experiment" and "reduce"; "reduce" in turn is related to "docu- mentation". This provides enough overlap to raise the document from 77th place in the rank list of retrieved documents to 9th. It should be noted that the useful relations are locally significant pairs (e.g. "automated" and "experimented"; "experiment" and "test" are not associated). An example from the Cranfield collection is query 226, whose key term is "Navier-Stokes" (equation). Document 08C does not contain this word, but it was introduced by the association procedure from the word "steady". The word "numerical" was introduced into both query and document from "Navier-Stokes" and "steady", respectively. Again, note the local