IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Word-Word Associations in Document Retrieval Systems chapter M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. IX-13 "algorithm" and "computer" is a significant pair. To determine the fraction of "significant" pairs on the local basis, the list of pairs was rechecked for significance and each word looked up in a concordance of the text to determine its local meaning. The results are shown as a function of frequency in Table 4, and as a function of cutoff in Table 5. Nearly three-quarters of the pairs are now meaningful. The remaining pairs which are not composed of related words are generally stylistic quirks. For example, the word "addition" was used only as part of the phrase "in addition", which appeared only in a few abstracts. The word "addition" was thus associated with the other words in these abstracts even though it had no significant meaning in this collection. More often, however, non-significant pairs are derived simply by accidental preferences of the author or one or more abstracts for certain words. If one abstract contains many instances of one word, a few in- stances of another word in that same abstract may appear to be a major amount of overlap to the association routine. Non-siqnificant pairs, however, represent only a small amount of the total number of pairs of words of high frequency when local meanings are taken into account. The majority of associations represent such "locally" related words Overall, about three-quarters of the associations consist of re- lated words; and 80% of these are related only because one of the words has a peculiar meaning in this collection. Fig. 3 shows additional examples of these local meanings. As a result of this peculiarity, the association process is not directly useful for determining word pairs