ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval The Query-Document Matching Function chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 4-13 role of system: classification categories. Thus for example, there may be no re[OCRerr]uirements for such categories to be intelligible to the users of the system if, in effect, they are used for purely internal storage organization. The tailoring of the classification process to the internal document searching operations 0£ the retrieval system offers, then, an increased degree of flexibility which can be exploited to optimize the overall search strategy. For present purposes, the following assumptions summarize the automatic document classification in a mechanized retrieval 1. The discernable information content of source documents which serve as the basis for classification is containei in the collection of index images to be used for detailed [OCRerr]uery-document comparison. 2. The objective of classification is to induce a storage org anizati.on.which allows' a:iizi[OCRerr]ited sear''ch to retrieve the same' documents as would be retrieved by a search of the full source collection. 3. The characteristics of the classification should be such that it jointly maximizes the search efficiency of the system and minimi'zes' the associated loss [OCRerr]of relevant 2. documents. On the basis of these assumptions it is clear that the nature of the query-document matching function is critical to the automatic classification process In'particular, to satisfy the objective