ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Information Analysis and Dictionary Construction
chapter
G. Salton
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
IV-5h
After the list of word-uses to be included in the thesaur'[OCRerr]s is
available, it becomes necessary to group them into thesaurus classes.
This can be [OCRerr]one in various ways:
1) an informal [OCRerr]udgment can be made for each pair of word-uses
to decide whether in the subject area under consideration, they
arc synony[OCRerr][OCRerr]us, and if so, they can be grouped in the sa[OCRerr]re
thes[OCRerr]-us class;
2) a set of "syntactic frames" can be used, and those word-uses
which fit into the same francs can be collected in the same
thesaur'[OCRerr]s group, or, equivalently, a decision is :-[OCRerr]de based
on hether term A can [OCRerr] replace term L in a given eonte:[OCRerr]
x.E9] This decision is of course not mechanized, but the
dictionary maker is faced only with local choices within
certain narrow limits;
3) a set of questions can be prepared designed to elicit answers
about the terms to be grouped, and each term can be identified
bi; the set of answers obtained in response to the proposed
questions; for exam[OCRerr][OCRerr]e, one might ask "does this term represent
a physical object or process, or does it represent an abstraction,
or is this question inapplicable"; a score of 1 may then be
assigned for a physical object, 2 for an abstraction, and 3 if
the question is not applicable.
At the end of such a procedure, each term is then identified by a set of
properties (in the form of contexts which fit a given term, or in the form
of answers to questions about the terms), and the complete vocabulary
may be represented by a property matrix, as shown in simplified form in
Fig. 18. It remains, then, to find the semantic distance between terms by
comparing the rows of properties representing the respective word-uses.
Specifically, rows which are completely identical can be coalesced
into a single group immediately; terms which are not identical may be