IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Word-Word Associations in Document Retrieval Systems chapter M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. Ix-4 Document 1 2 3 4 5 cat cat cat dog dog dog lion bear lion bear Contains fish mouse tiger wolf mole mouse bird a) Document - Term Assignment Term cat dog fish mouse lion bird Occurs in 1,2,3 1,4,5 1 1,2 2,4 12[OCRerr]3,5[OCRerr]31MAffi b) Term - Document Assignment Cosine Overlap (1+0+0) /[OCRerr](l+l+l)(1+l+l) rcat, dog = 1/3 = 0.33 1/3 0.33 (l+l)/3. 2 roat, mouse = 2[OCRerr]= 0.82 1/2 = 0.5 c) Computations of Association Example of Concept-Concept Association Procedure Fig. 1 For a cutoff of 0.45, "cat" would be found related to "mouse" but not to "dog". The vector of document 3, after expansion, would then include "cat, bear, tiger, mouse". If the cutoff were 0.6, and the correlation mode were overlap, "cat" and "mouse" would also be found [OCRerr]o be unrelated.