ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval The Indexing Function chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 2-1 CH[OCRerr][OCRerr]PTER 2 I[OCRerr]WEXING [OCRerr][OCRerr]CTION 1. Introduction The nature 0£ indexin[OCRerr] is examined in this chapter with emphasis on its relation to the document retrieval process. In this context, dQcument indexi[OCRerr] may be viewed as a nonreversible (in£ormation lossy) trans£ormation £rom the natural langu[OCRerr]e to an arti£icial la[OCRerr]age (the index lan[OCRerr]a[OCRerr]e), suitable Tor'retrieval purposes. The index trans£ormation is desi[OCRerr]ed in [OCRerr]eneral to accomplish two objectives: it serves on the one hand to trade amount or quantity 0£ in£ormation £or search speed (indexin[OCRerr] produces an in£ormation compression), and.on the other hand the index trans£ormation serves a la[OCRerr]age normalization £unction. Since indexi[OCRerr] produces in£orznation compression, the index representation 0£ a document can[OCRerr]be stored and manipulated with greater £acility than a representation 0£ the ori[OCRerr]inal text. The[OCRerr]inde'x trans£orma'tion also serves as a lan[OCRerr]ua[OCRerr]e normalization £unction in the sense that both vocabulary and structure in the index langu[OCRerr]e can' be controlled, whereas in the natural lan[OCRerr]ua[OCRerr]e they cannot. The [OCRerr]neral [OCRerr]al `0£ the indexing £unction in the context 0£ dbcument'retrieval, then, is.toprov[OCRerr]ide a compact \ representation 0£ the `in£orznation content 0£ source documents (or arbitrary se[OCRerr]IlentB 0£ natural lang[OCRerr]a[OCRerr]e texts) `in a controlled £orma[OCRerr].