ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
Information Analysis and Dictionary Construction
chapter
G. Salton
M. E. Lesk
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
IV-29
more general ones, and to formulate a search request by starting with a
[OCRerr]eneral formulation, and p[OCRerr]ogressively narrowing the specification do[OCRerr]m to
those areas which appear to be of principal interest. Thus, one can start
with a topic area such as t?[OCRerr]thematicsll, and from there proceed to [OCRerr]
which is a subdi'.rision of mathematics, from where in ti[OCRerr]rn one can go to
graph theorytt, which then leads to `1tree struc[OCRerr][OCRerr]rcs'1, from where final][OCRerr]
one can obtain the s[OCRerr]rntactic dependency trees previously illustrated in
Fi[OCRerr]. 7.
In a content analysis system, a hierarchical arrangement of words or
word stems can be used both for information identification and for retrieval
purposes. Thus, if a given search request is formulated in terms of
s[OCRerr]mtactic dependency trees!1, and it is found that not enough use[OCRerr][OCRerr] material
is actually obtained., it is possible to 1expandt'. this request to include all
tree structures or indeed all abstract graphs, by using a hierarchical
subject classification.
A hierarchy of concept numbers is included in the SMART system, and
it is assumed[OCRerr]that a thesaurus look-up operation precedes any hierarchical
expansion operation. A typical example [OCRerr]rom the SI[OCRerr]RT concept hierarchy
is shovni in Fig. 8. The broad, more general concepts appear on the left
side of the figure, corresponding to the "rootstt of the hierarchical tree;
and the more specific concepts appear further to the right. For exL[OCRerr]ple,
concept 270 is the root of a sub-tree, this concept has four sons on the
next lower level, namely concepts 224, 471, 472, and 488. Concept 221
in turn has two sons, labelled 261 and 331; simllarly, concept 471 has
four sons, including 338, 371, 458 and 470. It may be seen from Fig. 8,
that the sons of a concept, representing more specific terms, are shown