ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
An Experimental Investigation of Automatic Hierarchy Generation
chapter
G. Blomgren
A. Goodman
L. Kelly
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
v'II-12
3. Outline of the Investigation
The investigation proceeds in the following stages:
1) Implementation of the program to generate a term-term matrix.
2) Implementation of the program to set up list structures using
cutoff values.
3) Implementation of a program to present the list structure and
hierarchy in forms convenient for study.
[OCRerr]) Investigation of the effect of varying K for an actual S-matrix.
Attempt to confirm theory about variations and range behavior.
Since the aim of this investigation is the study of the techniques
and problems involved in automatic generation of hierarchies, and since
extensive use of tapes results in processing delays, the programming package
is designed for in-core operations. The 100 concepts used are a subset of
the 550 concepts in a collection of 82 documents previously used by the
S[OCRerr]RT system (ADI Collection).
In an actual retrieval system the processing involved in modifying a
query uses only th[OCRerr] list structure; however, for visual examination of the
hierarchy, this structure is not as convenient as a graph. The output
program generates a graph simllar to those in the examples above. To test
the output section of the programming package, a typical hierarchy was
constructed containing most of the relationships likely to occur. The
output resulting from this example appears in Appendix A.
Using the actual term-term matrix and various cutoff values, the
behavior of the hierarchical structure and the range phenomena were studied.
The anticipated transitions from brother-brother to parent-son to isolated