ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Information Analysis and Dictionary Construction chapter G. Salton M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. iv-68 weights of the properties are reasonably similar for both terms, so that neither term dominates the other, and they are placed in the same concept class; 3) terms A and B are identified by the same properties, but the property weights are higher for term A than for term B; then A may be said to dominate B, and may be placed on a higher level in the hierarchy; i[OCRerr]) terms A and B are identified by the same properties, and B dominates A. In order to be able to make a decision concerning the similarity between two property vectors, it is necessary to compute a similarity coefficient between them. In the present context, it is best to use an asymmetric coefficient such that the similarity between term i and term j is not necessarily the same as between term j and term i. Given property vectors v[OCRerr] and v[OCRerr], representing terms [OCRerr]. and [OCRerr]. respec- 1 tively, a possible similarity measure is min (v[OCRerr] v[OCRerr]) c.. = k -k ` k k-k Using this measure, a term-term correlation matrix can now be con- structed, giving for each pair of terms the similarity measure c. It may i be noticed, that if the two vectors v and v3 are identical, then c.. i j equals 1, and when V *and V have no common properties, then c.. -1J equals 0. A cut-off value K may now be applied to the similarity coefficients, and a hierarchy may be formed based on the following algorithm: [11]