ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Information Analysis and Dictionary Construction chapter G. Salton M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. Iv-59 occurrences. After applying the three questions of Fig. 19 (b) to the original corpus, one obtains the set of property vectors shown in Fig. 19 (c). After ordering the property sets in increasing numeric order, and combinin& the word-uses [OCRerr]rith identical property vectors, a reduced property matrix is obtained, as shown in Fig. 19 (d). This matrix contains 9 property vectors instead of the desired 5. In order to reduce the number of vectors, the class with the smallest frequency count is ex[OCRerr]mined (consisting of the term t1iogictt with a frequency of 9S instead of the desired 1+[OCRerr]c). The elimination of question B will not avail, since the reduced property vector (3,2) does still not combine with any other row. Eliminatin[OCRerr] question A, however, produces the reduced matrix of Fig. 19 (e), consisting of five clas[OCRerr]es [OCRerr]Tith frequencies varyin[OCRerr] between [OCRerr] and 632, close enough to the desired value to terminate the process. TThether the suggested process is al[OCRerr]iays manageable remains to be seen; however, in iriew of the obvioi[OCRerr]s simplifications involved, and the need for ccntext-lL[OCRerr]ited local decisions only, it seems worthwhile to attempt an implementation in an operational situation. 6. Semi-Automatic Hierarchy Formation The need for a hierarchica[OCRerr] arrangement of terms, or concept classes, as part of an information retrieval system is by no means obvious, although it is easy to find useful applications for a well-constructed hierarchy, particularly when search strategies are considered which are designed to proceed from more general to more specific search formulations or vice-versa.