ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Information Analysis and Dictionary Construction
chapter
G. Salton
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
finally, if
IV-69
if C.. and C.. are both below the cut-off value K, then
-13 -31
terms i and j are unrelated;
if and C.. are both above cut-off, then terms i and
-31
j are synonymous and are placed in the same thesaurus
category;
if is below cut-off and' c.. above cut-off, then term
-31
i is a parent of term j in the hierarchical arrangement;
is above cut-off and C.. below cut-off, then term
-31
j is a parent of term i.
This system may not generate a true tree structure, since a given
term may have more than one assigned parent. The method is, however,
fully automatic, and a manual revision after the initial generation can
be used to modify the resulting hierarchy to make it acceptable. This can
be accomplished, for example, by introducing cross-references between terms
in the hierarchy to replace the connections which are not compatible [OCRerr]ith
the tree organizatione A set of sample vectors is treated in the suggested
manner in Fig. 22. It is seen that property vectors which intuitively
appear to be similar [OCRerr] in fact be classified as synonymous (case 1),
vectors which appear unrelated are classified as unrelated (case 2), and
vectors for which an inclusion relation is apparent are assigned a
hierarchical ranking.
Various procedures have been suggested for updating hierarchies and
dictionaries by addition of [OCRerr] terms and deletion of old ones.[ll,12]
These must be used in conjunction with the dictionary look-up operations
in any operating situation.