ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Information Analysis and Dictionary Construction chapter G. Salton M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. Iv-63 It has been remarked in this connection, that when words, or word- uses, of unequal frequency are included in a thesaurus, or represented on an association map of the type shown in Fig. 16, a hierarchical arrangement results almost inevitably, since frequent words can be made into categories, and words of lesser frequency into subcategories. [[OCRerr]] Hierarchical association maps have in fact been constructed, using the frequency characteristics of the words as a criterion.L15] In any case, no matter what procedure is actually adopted, it would seem that a useful hierarchy which places general concepts near the top of the tree, and specific ones near the bottom, must exhibit the expected frequency characteristics which generally hold between broad and specific terms. Since the construction of a complete hierarchy without any guidelines is at the least a thankless task, and at worst an impossible one, methods imist be investigated to generate hierarchical arrangements semi-automatically. Three different procedures are outlined, all of which are based on a term- property matrix of the type shown in Fig. 18, or a term-document matrix as shown in Fig. 15 (a). The first process directly uses the questions also used for thesaurus construction, and breaks down the initial vocabulary as a function of the responses elicited. An initial question is asked first, and classes of word-uses are formed based on the responses to this question; the next question is then applied to each of the resulting word classes which are thereby broken down again, and so on, until the subdivision is sufficiently fine. The process is applied to the vocabulary of Fig. 19 (a) in conjunction with the questipns of Fig. 19 (b). The resulting hierarchy is shown in Fig. 20, which shows the word-use frequency attached to each node.