ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Information Analysis and Dictionary Construction
chapter
G. Salton
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
IV-52
may not always hold, and if it holds, its applicability may be restricted
to a given document collection rather [OCRerr]an to a complete subject field.
For this reason, it is of interest to consider also somewhat less radical
procedures which avail themselves of a certain amount of human judgment.
These methods are general[OCRerr]r based on various antomatic aids, but use subject
e[OCRerr]merts for the basic task of defining the meaning of each term being
introduced into the thesaurus.L9,lC,ll,12]
The basic idea is to start with a word frequency list, as before,
for the words included in a given document collection. In addition, it is
also useful to have available a listing which exhibits the words in context,
so that a distinction may be made between individual word-uses for ambiguous
terms. For exL[OCRerr]rLple, a word such as 1'base" may be broken down into 11base11',
"base2",and t1base3t1, to represent, respectively "army base11, "lamp base",
and `1baseball baseTT (assuming that those three uses of the term are in fact
present in a given collection). A standard "keyword-in-context" ([OCRerr]TIc)
list may be prepared automatically, to permit a human observer to ascertain
the individual word-uses for the terms included in a collection. An
example of a typical K'.[OCRerr]C index list, used in conjunction with the SMA[OCRerr]T
system is shown in Fig. l7.Ll[OCRerr]]
Fig. l[OCRerr] shows that the term "spectral" is used in the given collection
in only one sense, namely that of a "spectral norm"; the term "square" is,
however, used in two senses in the concordance excerpt, first as a rec-
tangle of equal sides (square matrix), and then as a power of two (square
root). The list of word-uses to be constructed would then include a
single instance of the term "spectral", but two separate examples of
square