ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Information Analysis and Dictionary Construction chapter G. Salton M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. IV-52 may not always hold, and if it holds, its applicability may be restricted to a given document collection rather [OCRerr]an to a complete subject field. For this reason, it is of interest to consider also somewhat less radical procedures which avail themselves of a certain amount of human judgment. These methods are general[OCRerr]r based on various antomatic aids, but use subject e[OCRerr]merts for the basic task of defining the meaning of each term being introduced into the thesaurus.L9,lC,ll,12] The basic idea is to start with a word frequency list, as before, for the words included in a given document collection. In addition, it is also useful to have available a listing which exhibits the words in context, so that a distinction may be made between individual word-uses for ambiguous terms. For exL[OCRerr]rLple, a word such as 1'base" may be broken down into 11base11', "base2",and t1base3t1, to represent, respectively "army base11, "lamp base", and `1baseball baseTT (assuming that those three uses of the term are in fact present in a given collection). A standard "keyword-in-context" ([OCRerr]TIc) list may be prepared automatically, to permit a human observer to ascertain the individual word-uses for the terms included in a collection. An example of a typical K'.[OCRerr]C index list, used in conjunction with the SMA[OCRerr]T system is shown in Fig. l7.Ll[OCRerr]] Fig. l[OCRerr] shows that the term "spectral" is used in the given collection in only one sense, namely that of a "spectral norm"; the term "square" is, however, used in two senses in the concordance excerpt, first as a rec- tangle of equal sides (square matrix), and then as a power of two (square root). The list of word-uses to be constructed would then include a single instance of the term "spectral", but two separate examples of square