ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
An Experimental Investigation of Automatic Hierarchy Generation
chapter
G. Blomgren
A. Goodman
L. Kelly
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
T - - - Tt
1
D1 T1
v'II-3
rp - -- T
1 t
Dd
C. 5jk
jk
Tt
Document-term Term-term
where d d
[OCRerr] [OCRerr]ILJ (C. .,c [OCRerr] MIN (C. ,c ) for
= i=l ij ik S = i=1 ii ik j[OCRerr]k
jk d kj d
z C..
i-i i=1 ik
o <S jk < 1 for j[OCRerr]k
S.. is not defined and is never used
JJ
d
d
S Z ... = . C'
jk [OCRerr] [OCRerr] kj i-l
so that
z C..
S = S
kj jk [OCRerr] Cik
The second step is the evaluation of relationships between pairs of
terms. Choosing a `1cutoff' parameter 0 <K < 1, apply the following rules
[7]:
S.
1) jk <K,
5ki <K T. and T are unrelated, since the two
j k
terms [OCRerr]enerally are not relevant to the
same documents.
2) %.k>K, Ski >K T. and T are similar, since both terms
k
generally are relevant to the same documents.
Similar terms are called brothers.