IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval An Experiment in Automatic Thesaurus Construction chapter R. T. Dattola D. M. Murray Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. VIII-' VIII. An Experiment in Automatic Thesaurus Construction R. T. Dattola and D. M. Murray Abstract A method is presented for the automatic construction of thesauruses used in information retrieval systems. The construction algorithm is based on the concept-concept associations displayed in a sample document collection. 1. Introduction Information retrieval systems often use a thesaurus look-up to determine the information content of a) documents put into the system and b) requests for information from the system. [1] With respect to documents, the look-up reduces the written text to a set of concept numbers representing the keywords, phrases, and ideas of the text. With respect to requests for information, the thesaurus look-up expands a query by assigning to it concept numbers which represent more general ideas than those in the original query. Thesaurus construction is often performed by hand or by semi-automatic methods [2]. Hand preparation is time-consuming and relies on human judgment to determine the desired thesaurus classes. Semi-automatic methods require less intellectual attention, but are also in need of human attention. A fully automatic method is desirable, since it would a) provide rapid construction, b) form thesaurus classes strictly on the basis of information in the document collection under consideration, and c) apply easily to a wide range of subject areas. [4]