IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
An Experiment in Automatic Thesaurus Construction
chapter
R. T. Dattola
D. M. Murray
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
VIII-'
VIII. An Experiment in Automatic Thesaurus Construction
R. T. Dattola and D. M. Murray
Abstract
A method is presented for the automatic construction of thesauruses
used in information retrieval systems. The construction algorithm is based
on the concept-concept associations displayed in a sample document collection.
1. Introduction
Information retrieval systems often use a thesaurus look-up to
determine the information content of
a) documents put into the system and
b) requests for information from the system. [1]
With respect to documents, the look-up reduces the written text to a set of
concept numbers representing the keywords, phrases, and ideas of the text.
With respect to requests for information, the thesaurus look-up expands a
query by assigning to it concept numbers which represent more general ideas
than those in the original query.
Thesaurus construction is often performed by hand or by semi-automatic
methods [2]. Hand preparation is time-consuming and relies on human judgment
to determine the desired thesaurus classes. Semi-automatic methods require
less intellectual attention, but are also in need of human attention. A
fully automatic method is desirable, since it would
a) provide rapid construction,
b) form thesaurus classes strictly on the basis of information in
the document collection under consideration, and
c) apply easily to a wide range of subject areas. [4]