IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Thesaurus, Phrase and Hierarchy Dictionaries chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. VII-3 Ignoring the semi-automatic "Hastie" ADI Thesaurus-SAl and the Cran-1 Thesaurus-i (made without use of the construction rules), the dictionaries average 594 concepts each, with 10.1 text words grouped into each concept. Some sample excerpts from three dictionaries illustrating the grouping of similar terms in the context of three collections used are given in Fig. 2. It may be noted that a topic such as "Algebra" or "Calculate" is grouped only with almost synonymous terms (if any exist) when these topics are central to the collection in use, but a broader grouping is used when these topics are more peripheral to the subject field of the collection. Hyphenated word pairs are normally treated as a single word and usually put with the group most closely associated; for example "computing-machine" is put in the group which includes "computer" rather than the group including "machine". The need to group single words creates problems of ambiguity that [OCRerr]re only partially solved by putting such words into more than one group. The word "factor", for example, may need to be grouped with "coefficient" as well as with "parameter" and "variable", but an incoming request containing"factor" then maps into several thesaurus groups, and only a decrease in weight resulting from the multiple mapping is then available to attempt to minimize the effect of the unwanted association. Some suggestions for further studies on dictionary construction are given in part 8. 3. Description of Phrase Dictionaries Since the thesaurus dictionaries contain single words only, some kind of phrase processing is a reasonable alternative for dictionary construction.