ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval An Experimental Investigation of Automatic Hierarchy Generation chapter G. Blomgren A. Goodman L. Kelly Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. VIII-2 Hierarchies of concepts or terms are usually prepared manually from the documents in a particular collection. Such a preparation requires much time and involves human judgment of relationships between concepts. Human judgment is likely to vary substantially from person to person, resulting in various hierarchies from the same document collection; moreover, these judgments rely on knowledge and experience external (and perhaps extraneous) to the collection. Analogous problems arise in manual indexing of documents or abstracts. These delays and inequities of manual construction might be overcome by an automatic scheme implemented on a[OCRerr]computer. Such a scheme offers two advantages 1) Machine preparation eliminates the time-consuming, routine work in outlining a hierarchy. 2) In making decisions about concept relationships, the machine depends only upon the particular documents in the collection, avoiding extraneous information. [2,8] 2. Automatic Construction of Hierarchies The basic source of information about relationships between terms is the document-term matrix, a listing of documents showing the degree of relevance of each term to each document. The first step is the construction of a non-symmetrical term-term matrix. The authors use the follo[OCRerr]dng algorithm [7)