IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Word-Word Associations in Document Retrieval Systems
chapter
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
Ix-4
Document 1 2 3 4 5
cat cat cat dog dog
dog lion bear lion bear
Contains fish mouse tiger wolf mole
mouse bird
a) Document - Term Assignment
Term cat dog fish mouse lion bird
Occurs in 1,2,3 1,4,5 1 1,2 2,4 12[OCRerr]3,5[OCRerr]31MAffi
b) Term - Document Assignment
Cosine Overlap
(1+0+0) /[OCRerr](l+l+l)(1+l+l)
rcat, dog = 1/3 = 0.33 1/3 0.33
(l+l)/3. 2
roat, mouse = 2[OCRerr]= 0.82 1/2 = 0.5
c) Computations of Association
Example of Concept-Concept Association Procedure
Fig. 1
For a cutoff of 0.45, "cat" would be found related to "mouse" but not to
"dog". The vector of document 3, after expansion, would then include
"cat, bear, tiger, mouse". If the cutoff were 0.6, and the correlation
mode were overlap, "cat" and "mouse" would also be found [OCRerr]o be unrelated.