ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
An Experimental Investigation of Automatic Hierarchy Generation
chapter
G. Blomgren
A. Goodman
L. Kelly
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
VI[OCRerr]I-[OCRerr]
A concc[OCRerr]t which is unrelated to all other concepts is called ?Iisolated?t.
ThTO types of i[OCRerr]olation may be defined. Consider the entrico in a term-term
matri:'. For a given cutoff value K, a concept is `conditionally isolated11
if all entries relating to it are less than K. A concept i "unconditio-
nal1[OCRerr] isolated1 if (1) it is assigned to no document in the collection;
or (2) when it is assigned to a document, it is always the only concept
ssigned. The latter type of conce[OCRerr]t re:"[OCRerr]ins isolated for all K > 0.
The above discussion and e:[OCRerr][OCRerr]nple il[OCRerr]ustrate that all information
about concept relationships is not contained in one hierarchy constructed
for one cutoff [OCRerr][OCRerr]lue. As K varies from 0 to 1, soTne relationshi[OCRerr]s endure
over a wide range of K-values (say 0.1 to 0.9); these relations are well-
defined, or strong. Other relationships appear only once, or over a small
range of K-[OCRerr]'alues (say 0.2 to 0.3); these relations are less well-defined,
or weak.
`.[OCRerr]hile one user may be satisfied with a hierarchy which contains weak
relationships, another user may desire a hierarchy containing only very
[OCRerr].`ell-defined relationships; neither would be satisfied with a hierarchy
which specifies well-defined relationships only in the region of a
particular K-value.
The authors suggest a fourth step: the construction of "composite'1
hierarchies. For a given range R of K-values, a composite hierarchy is
generated to include only those relationships which exist over a range
> F. It is possible that brother, parent-son, and t1unrelated'1 relations
for a pair of concepts may all exist over ranges > F; in such a case the
authors recommend that the parent-son relation take precedence over the
"unrelated" relation. The reason for such a decision rule is that a