ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
An Experimental Investigation of Automatic Hierarchy Generation
chapter
G. Blomgren
A. Goodman
L. Kelly
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
VIII-2
Hierarchies of concepts or terms are usually prepared manually from
the documents in a particular collection. Such a preparation requires much
time and involves human judgment of relationships between concepts. Human
judgment is likely to vary substantially from person to person, resulting
in various hierarchies from the same document collection; moreover, these
judgments rely on knowledge and experience external (and perhaps extraneous)
to the collection. Analogous problems arise in manual indexing of documents
or abstracts.
These delays and inequities of manual construction might be overcome
by an automatic scheme implemented on a[OCRerr]computer. Such a scheme offers
two advantages
1) Machine preparation eliminates the time-consuming, routine work
in outlining a hierarchy.
2) In making decisions about concept relationships, the machine
depends only upon the particular documents in the collection,
avoiding extraneous information. [2,8]
2. Automatic Construction of Hierarchies
The basic source of information about relationships between terms is
the document-term matrix, a listing of documents showing the degree of
relevance of each term to each document. The first step is the construction
of a non-symmetrical term-term matrix. The authors use the follo[OCRerr]dng
algorithm [7)