IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Word-Word Associations in Document Retrieval Systems chapter M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. IX-17 that should be connected in a thesaurus. It can be used, however, to point to word relations not normally apparent, and thus it serves as an aid to dictionary constructors who are working with a known collection. It should be noted again that these experiments were run on a collection of 40,000 words. It may well be that in larger collections, the apparent meanings of words approximate their common meanings more closely. This point will be the subject of future investigation, but the presence of apparently meaningless correlations has already been noted by work[OCRerr]rs with much larger collections. [1] The properties of second-order associations were also investigated. These are word pairs, which need not co-occur in any documents, but must have common first-order associations. Almost all second-order associations, however, were also found to be first-order associated terms. They generally arise from large blocks of words, all of which were used to discuss some subject, and all of which were first-order associations of each other. For example, the set of words "height", "atmosphere", "density", "km", etc. are all used in a set of documents about the measurement of the density of the upper atmosphere. They were all identified as first-order association, and all became second-order associations. Stylistic quirks were not eliminated by the repetition of the correlation process; and the total number of associations was greatly diminished by a factor of 8-10. Second-order associations did not produce useful synonyms; even the one or two useful synonyms in the first-order associations (e.g. "error", as in "error function", and "erfc", its abbreviation) tended to disappear in second-order, as did most other associations. The use of second-order