ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval Information Analysis and Dictionary Construction chapter G. Salton M. E. Lesk Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. iv-6~ Qilestion B is first applied to the complete vocabulary, thus forming two groups of 1'physical O[OCRerr]j[OCRerr][OCRerr][OCRerr]5tT and [OCRerr]1abstractions or processes11, with a frequency of 1119 and 1079, respectively. [OCRerr]uestion C is then used to furnish the five classes already shown in Fig. 20.Li[OCRerr]] A somewhat different process operates directly from the word-use frequencies, and is therefore not based on the thesaurus groupings as is the previous method. Instead, the hierarchy is constructed first, and the thesaurus is later based on the previously available hierarchy. A start is made as before, with a concordance and a word frequency list, and the word-uses are selected for inclusion in the hierarchy. The two-[OCRerr][OCRerr]y hierarchy is now started by choosing the word-use with highest frequency, say word T;, and letting one node represent word T. plus all words 1 1 like it, the second branch ropresenting all Ilother!1 words not related to [OCRerr] The word group of highest total frequency is now chosen, and its 1 high frequency word is again used as a criterion for partitioning; this procedure continues until all word groups are s-a.[OCRerr]ll enough to be entered as concept classes into the thesaurus. At each point in the partitiqning process the following local decisions must be made; 1) the highest frequency word in the high frequency word group is chosen, and it is used as the ?!centralf! word of the subbranch; the other words in the sa[OCRerr]e word group are then examined to see if they fall into the same subbranch by being related in one way or another to the central word; no relations need exist among the words which form the 11cther1t, unrelated class; 2) if a given word cannot properly be placed in one of the two categories (either related to the central word, or unrelated), it is left at the present level as a parent of the words in the