ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
The Indexing Function
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
2-6
inclusion relations are also of interest, e.[OCRerr]. cause-effect, process-
products, etc. and these can be used to define additional transformations.
For example, the class of'elements denoting processes can be identified
and the. corresponding products listed. A document image containing a
process term may then be modified to include the associated product term
*and vice versa.
In summary it is possible, then, to consider semantic index
transformations which include a variety of term associations such that
in principle a multiplicity 0£ index representations can be produced
based on the same set 0£ machine recogr[OCRerr]izable li[OCRerr]istic features. The
-. problem with such transformations, in [OCRerr]neral, is that a lar[OCRerr] number
0£ a .priori[OCRerr]semantic associations are possible among the index terms
describing a [OCRerr]ven document. The correct associations are dependent on
the context in which the terms are used so that a context free encoding
such as is generally produced by machine processing does not necessarily
improve the[OCRerr] accuracy of the index representation of information content.
C. 3yntact''ic[OCRerr] Techniques
In general both the statistical and semantic procedures
discussed above i[OCRerr]ore the informati:Qn carried by the structural
constraints'of the natural language. It is possible, however, to
in' corporate a number of, syntactic reco[OCRerr]riition features into automatic
indexing a'l[OCRerr]oritbms. One obvious use of this' [OCRerr]ind of information -is
stem detection,' i.e.' recognition of the intrinsic association of the
various'morphologica'l.forms of a'[OCRerr]ven word. Stem detection is readily