ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval The Indexing Function chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 2-1[OCRerr] the statistical approximations necessary in a context independent £ramework must necessarily distort the characterization 0£ the doc[OCRerr][OCRerr]ment1s content. This su[OCRerr][OCRerr]ests that there are essentially two alternatives to improvi[OCRerr] an already statistically optimized index trans£ormation. One method clearly involves the incorporation o£ context dependent recognition procedures into the content detection process. In some sense, this is approximated by enco'di[OCRerr] lar[OCRerr]r se[OCRerr]ents 0£ the natural la[OCRerr]ge text, e.[OCRerr]. phrases instead 0£ words, or sentences instead of pbra[OCRerr]es. Alternatively, context dependence can be introduced by multi-level re'co[OCRerr]nition procedures in which the decision rules are altered by [OCRerr]lobal interpretation 0£ a context £ree encodi[OCRerr], thereby produci[OCRerr] a second context dependent index representation. Consider [OCRerr]in a thesaurus trans£ormation o£ the[OCRerr] type illustrated in'Fi[OCRerr]re"2.i. Assumingthat all ambi[OCRerr]ous input terms (terms which map into more thanone thesaurus `cate[OCRerr]ry) are mapped[OCRerr]. with statistically derived wei[OCRerr]hts asdescribed above, one can expect that the correct contezt[OCRerr]will be rein£orced'over all the term encodi[OCRerr]s characterizi[OCRerr] the document, whereas the incorrect ones will not. The term trchannel;t mapped as shown in Fi[OCRerr]e 2.1, is initially associated with two alternative contexts. A£ter the entire initial encodingis completed, it should be possible to derive a total score £or 6ontext 11magnetic disic" vers[OCRerr] the context "in£ormation transmission" by comparing `;he total weights 0£ all