ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
The Indexing Function
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
2-4
There exist obvious problems to such extensions on the practical level.
An important defec.t in such an index transformation lies in the
fact that the structure of the index im[OCRerr] provides no facility for
representi[OCRerr] the semantic associations which exist between distinct
word types in the natural lan[OCRerr][OCRerr]e. One proposal for deali[OCRerr] with such
associations on a statistical basis consists in assumi[OCRerr] that they can
be derived a posteriori from a set of index im[OCRerr]s characterizi[OCRerr] a
doc[OCRerr]iment collection in some given sub[OCRerr]ect' area. Thus one can assume,
for example, that terms which co-occur in the sentences of a given
document, or in the documents `of a given collection,[OCRerr]more frequently
than the average are, in fact, semantically as well as statistically
10,11,12
relate[OCRerr]. In the' formal associative model, it is possible to
account for key word associations of hi[OCRerr]her `order than the first and
in addition to use these associations to influence query-document
matching procedures. In s,uch a system, a document is repr'esented by
its keyword set and additionally by the statistical properties of' the
representations of all other documents in the collection.
B. Semantic Techniques
An important alternative to the statistical associative
,process consists in providing a specific semantic model in the index
transformation directly. The indexing function may then be implemented
by a thesaurus mapping containing a pre-def med set of semantic
associations. A thesaurus transformation may be defined as a many to
many mapping from recognizable word types or phrases to thesaurus