ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval The Indexing Function chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 2-13 and lin[OCRerr][OCRerr]istic style, and due to the di££iculties 0£ extracti[OCRerr] contextual in£ormation, any set 0£ properties chosen to encode the in£ormation content 0£ documents or search requests in a [OCRerr]ven £ield must re[OCRerr]lect statistical approximations over the usage 0£ the detected £eatures. Such a statistical basis is clearly evident in the statistical association indexi[OCRerr] model discussed earlier, where it £orms an explicit part 0£ the index representation. In various other indexing[OCRerr]schemes such as manual descriptor indexin[OCRerr], or in mechanized thesaurus indexing, the statistical approximations are, in e£fect, hidden in the decision rules incorporated in the index trans£ormation. This necessary statistical basis £or document content encoding is emphasized because 0£ its signi£icance in terms 0£ the problems 0£ [OCRerr]nerating, maintaining, and evaluating indexing schemes. Consider as a concrete example the indexing model speci£ically assumed in this reporte The semantic associations incorporated in the thesaurus mapping £rom word stems into thesaurus or concept cate[OCRerr]ories can be established 9n an ad hoc basis, re£lecting individual or collective value jud[OCRerr]ents. It is possible, however, to subject these value[OCRerr]jud[OCRerr]uients to experimental veri£ication. Assume, £or example, that a [OCRerr]ven set 0£ natural langnage terms (words, phrases, etc.) is mapped into a single attribute 0£ the index space, i.e. all the elements 0£ the set have been jud[OCRerr]ed to be su££iciently associated so as to be treated as a unit in the index language. It is [OCRerr] the occurrence 0£ this