ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Information Analysis and Dictionary Construction chapter G. Salton M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. IV-50 Given such a term-document matri[OCRerr] or graph, it is now possible, by well-kno[OCRerr]m statistical association methods, to compute similarity coefficients between terms, based on co-occurrence characteristics of the terms in the documents of the collection. The similarity coefficient between each pair of terms can then be made to depend on the frequency with which the terms are jointly assigned to the documents of a collection. In Fig. 15, for example, it may be noted that terms T an 1 d T6 are both assigned to documents D1 and D[OCRerr] (although with differing weights), while they are both not assigned to documents D and D . As a result, 2 3 the term association process may assign these two terms to a common thesaurus category. For the example of Fig. 15 an associative procedure might result in the formation of three term (thesaurus) groups, consisting respectively of terms T1 and T6 (because of joint assigument to documents D1 and [OCRerr] ), terms T and of joint assi[OCRerr]'uent to D and 7 T[OCRerr] (because 1 D2 ), and finally terms T2, T and T (because of joint assi[OCRerr]'nent 3 5 3 to D and ). The result of a term association process may then be displayed as an association map, iii which branches between terms represent term relations, or, alternatively, thesaurus groupings. An excerpt from a typical term association map is sho[OCRerr]m in Fig. l6.[)[OCRerr],7,8J The thesaurus groupings suggested by the map of Fig. 16 can be found by inspection. B) Semi-Automatic Methods The methods outlined in the preceding part are based on the assumption that term co-occurrences in documents, or joint assignment of terms to documents are indicative of term SL'Thilarity or relatedness. This assumption