ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
The Indexing Function
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
j
i
2-8
applicable to both statistical and semantic processing since the meaning
0£ a word is [OCRerr]enerally invariant over its morphological variants. Much
more ambitious syntactic processing procedures are also under *
investigation, including the use 0£ £ully automatic syntactic analysis.
A £ull sentence by sentence syntactic analysis could, £or example,
provide explicit dependency relations among the various semantic
elements 0£ a sentence, and could be us[OCRerr]ed £or phrase recogaition or £or
the reco[OCRerr]rijtion 0£ structurally constrained associations among semantic
terms. At thepresent time it[OCRerr]is not clear whether the complexity
[OCRerr]e[OCRerr]uired £or the reco[OCRerr]riition 0£ complex structural constraints is
justi£ied in terms 0£ the additional in£ormation extracted thereby.
4. The Structure 0£ Index Representations
The index trans£ormation represents a mapping £rom the natural
langua[OCRerr] 0£ the source text to the tar[OCRerr]t or index la[OCRerr][OCRerr]e. The index
ima[OCRerr]e 0£ a source document is thus a representation 0£ the content 0£
* the document[OCRerr]in this target langnagee[OCRerr] The most commonly used index
* lai[OCRerr]ge structure is the description list, or property vector, in
which the indez image consists 0£ a list 0£ those properties 0£ a £inite
set which characterize'the document. Index images 0£ this type are
used, £or example, in Uniterm systems[OCRerr]where the document representation
is an unordered set of keywords (descriptors,,uniterms, etc.). 1£ the
property set is[OCRerr]ordered, £or example, by a 1 to 1 mapping to the set 0£
* integers, the index image may be encoded as a binary vector. A more
general. representation 0£ the same type allows £or a quantization 0£ the