ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
The Indexing Function
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
2-9
value, or degree, to which each attribute pertains to the document by
associati[OCRerr] a scalar with each attribute. In this case the index
images can be encoded as numeric r&ther than binary description
vectors.in the attribute space. Table 2.1 illustrates a typical
16
[OCRerr]eyword description derived by statistical analysis (from Booth ) in
which the relative frequency of the 15 most fre[OCRerr]uent non-common word
stem types from a sample document are shown. This analysis can be
used to establis[OCRerr] a property set index image' (by employi[OCRerr] a frequency
sensitive selection procedure), a binary description vector, or a
numeric description vector incorporating relative frequency information[OCRerr].
Symbolic examples of each of these are illustrated in Figure 2.2.
A property list description does not allow for a direct
representation of any relations among the various attributes, unless
these are specifically identified in the attribute space. Since
information in the natural langna[OCRerr]e is conveyed by semantic referents
(words, phrases)'and by the relations indicated among the referents
(syntax and context), index langnages capable of explicitly
representing relations among attribute's have been investigated. A
17
variety of such structures'have' been studied,,. `including.. tre., e.and'.
graph representations.' A syntactic dependency tree, for example, can
represent a naturaLlangua[OCRerr]e sentence by associating its nodes with
the semantic'values of the words they represent, and its branches
- 18
with direct syntactic dependency. An example (from Sussengnth ) is
illustrated in Figure 2.3. While such'index structures `are capable
of more precise modeling of the inf'or[OCRerr][OCRerr][OCRerr]ation carying elements of the