IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Summary summary Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. Thesaurus dictionaries, phrase dictionaries and hierarchical arrangements of terms are described and evaluated for retrieval ef- fectiveness in section VII. A thesaurus is generally used to assemb certain terms into common thesaurus groups according to specified simi- larity criteria. Terms within the same group can then be reduced to a unique class number, thus providing a certain amount of language normalization. The best thesaurus dictionaries produce an average retrieval performance superior to that provided by the stem dictionaries. For high-precision users, the thesaurus results are not, however, very different from the stem results. Thesaurus construction rules have been devised to insure that a thesaurus is obtained which will, in fact, operate satisfactorily in a retrieval environment, and produce the expected improvements for high recall users. The results exhibited in section VII for the phrase dictionaries and hierarchical subject arrangements show that the effect of these devices is not as yet sufficiently reliable to warrant their inclusion in operational situations. Suggestions are also made in section VII for additional retrieval experiments using stored dictionaries, and for the generation of additional language normalization tools. An experiment in fully-automatic thesaurus construction is des- cribed in section VIII by R. T. Dattola and D. M. Murray. The procedure consists in breaking a document collection down into sub-collections, using document-document correlation methods. For each sub-collection, a thesaurus is then constructed using term-term correlation methods. Finally, xv