ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
The Indexing Function
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
2-1
CH[OCRerr][OCRerr]PTER 2
I[OCRerr]WEXING [OCRerr][OCRerr]CTION
1. Introduction
The nature 0£ indexin[OCRerr] is examined in this chapter with
emphasis on its relation to the document retrieval process. In this
context, dQcument indexi[OCRerr] may be viewed as a nonreversible (in£ormation
lossy) trans£ormation £rom the natural langu[OCRerr]e to an arti£icial
la[OCRerr]age (the index lan[OCRerr]a[OCRerr]e), suitable Tor'retrieval purposes. The
index trans£ormation is desi[OCRerr]ed in [OCRerr]eneral to accomplish two
objectives: it serves on the one hand to trade amount or quantity 0£
in£ormation £or search speed (indexin[OCRerr] produces an in£ormation
compression), and.on the other hand the index trans£ormation serves a
la[OCRerr]age normalization £unction. Since indexi[OCRerr] produces in£orznation
compression, the index representation 0£ a document can[OCRerr]be stored and
manipulated with greater £acility than a representation 0£ the
ori[OCRerr]inal text. The[OCRerr]inde'x trans£orma'tion also serves as a lan[OCRerr]ua[OCRerr]e
normalization £unction in the sense that both vocabulary and structure
in the index langu[OCRerr]e can' be controlled, whereas in the natural
lan[OCRerr]ua[OCRerr]e they cannot. The [OCRerr]neral [OCRerr]al `0£ the indexing £unction in
the context 0£ dbcument'retrieval, then, is.toprov[OCRerr]ide a compact \
representation 0£ the `in£orznation content 0£ source documents (or
arbitrary se[OCRerr]IlentB 0£ natural lang[OCRerr]a[OCRerr]e texts) `in a controlled £orma[OCRerr].