ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval An Experimental Investigation of Automatic Hierarchy Generation chapter G. Blomgren A. Goodman L. Kelly Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. VI[OCRerr]I-[OCRerr] A concc[OCRerr]t which is unrelated to all other concepts is called ?Iisolated?t. ThTO types of i[OCRerr]olation may be defined. Consider the entrico in a term-term matri:'. For a given cutoff value K, a concept is `conditionally isolated11 if all entries relating to it are less than K. A concept i "unconditio- nal1[OCRerr] isolated1 if (1) it is assigned to no document in the collection; or (2) when it is assigned to a document, it is always the only concept ssigned. The latter type of conce[OCRerr]t re:"[OCRerr]ins isolated for all K > 0. The above discussion and e:[OCRerr][OCRerr]nple il[OCRerr]ustrate that all information about concept relationships is not contained in one hierarchy constructed for one cutoff [OCRerr][OCRerr]lue. As K varies from 0 to 1, soTne relationshi[OCRerr]s endure over a wide range of K-values (say 0.1 to 0.9); these relations are well- defined, or strong. Other relationships appear only once, or over a small range of K-[OCRerr]'alues (say 0.2 to 0.3); these relations are less well-defined, or weak. `.[OCRerr]hile one user may be satisfied with a hierarchy which contains weak relationships, another user may desire a hierarchy containing only very [OCRerr].`ell-defined relationships; neither would be satisfied with a hierarchy which specifies well-defined relationships only in the region of a particular K-value. The authors suggest a fourth step: the construction of "composite'1 hierarchies. For a given range R of K-values, a composite hierarchy is generated to include only those relationships which exist over a range > F. It is possible that brother, parent-son, and t1unrelated'1 relations for a pair of concepts may all exist over ranges > F; in such a case the authors recommend that the parent-son relation take precedence over the "unrelated" relation. The reason for such a decision rule is that a