ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
The Query-Document Matching Function
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
4-13
role of
system:
classification categories. Thus for example, there may be no
re[OCRerr]uirements for such categories to be intelligible to the users of
the system if, in effect, they are used for purely internal storage
organization. The tailoring of the classification process to the
internal document searching operations 0£ the retrieval system
offers, then, an increased degree of flexibility which can be
exploited to optimize the overall search strategy.
For present purposes, the following assumptions summarize the
automatic document classification in a mechanized retrieval
1. The discernable information content of source documents
which serve as the basis for classification is containei
in the collection of index images to be used for
detailed [OCRerr]uery-document comparison.
2. The objective of classification is to induce a storage
org anizati.on.which allows' a:iizi[OCRerr]ited sear''ch to retrieve
the same' documents as would be retrieved by a search of
the full source collection.
3. The characteristics of the classification should be such
that it jointly maximizes the search efficiency of the
system and minimi'zes' the associated loss [OCRerr]of relevant 2.
documents.
On the basis of these assumptions it is clear that the nature
of the query-document matching function is critical to the automatic
classification process In'particular, to satisfy the objective