ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
An Experimental Investigation of Automatic Hierarchy Generation
chapter
G. Blomgren
A. Goodman
L. Kelly
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
viii-4
3) %k>K, Skj<K
IL.) S <
jk K, Skj>K
T is a parent of T., since T. and T
k 3 3 k
appear together often, but Tk is relevant
to more documents. (Or, T. is a son of Tk.)
3
T. is a parent ofT
3
The third step is the construction of a hierarchy in a form convenient
for modification of queries. The authors pro[OCRerr]ose a list structure wherein
each term (or concept) owns a list of its parents,a list of its brothers,
and a list of its sons A term with no parents, brothers, or sons is
called `[OCRerr]isolated1T; the phenomenon of isolation is discussed below. If a
query is to be generalized, the entries of the parent list of each query
term ar[OCRerr] added to the query vector; if a query is to be specialized, the
entries of the sons list of each query term are added; if 4 query is to be
expanded [OCRerr]Tith similar terms, the entries of the brother list of each query
term may be added.
These steps are illustrated in the following example.
1) Given the document-term matrix, C:
T T T
1 2 3
D 2 0 5 1
1
D 1 IL 1 3
2
n IL 1 3 0
3.
Derive the term-term matrix, S:
(Steps in calculating [OCRerr]12 are illustrated)
J
[OCRerr] C =2+l+IL=7
i=l ii
1
S = 77 *[ [OCRerr] (2,0) + MIN (l,IL) + MIN ([OCRerr],l)]
12
2
--7