ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval An Experimental Investigation of Automatic Hierarchy Generation chapter G. Blomgren A. Goodman L. Kelly Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. v'II-12 3. Outline of the Investigation The investigation proceeds in the following stages: 1) Implementation of the program to generate a term-term matrix. 2) Implementation of the program to set up list structures using cutoff values. 3) Implementation of a program to present the list structure and hierarchy in forms convenient for study. [OCRerr]) Investigation of the effect of varying K for an actual S-matrix. Attempt to confirm theory about variations and range behavior. Since the aim of this investigation is the study of the techniques and problems involved in automatic generation of hierarchies, and since extensive use of tapes results in processing delays, the programming package is designed for in-core operations. The 100 concepts used are a subset of the 550 concepts in a collection of 82 documents previously used by the S[OCRerr]RT system (ADI Collection). In an actual retrieval system the processing involved in modifying a query uses only th[OCRerr] list structure; however, for visual examination of the hierarchy, this structure is not as convenient as a graph. The output program generates a graph simllar to those in the examples above. To test the output section of the programming package, a typical hierarchy was constructed containing most of the relationships likely to occur. The output resulting from this example appears in Appendix A. Using the actual term-term matrix and various cutoff values, the behavior of the hierarchical structure and the range phenomena were studied. The anticipated transitions from brother-brother to parent-son to isolated