ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
The Query-Document Matching Function
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
4-16
employed in the index space. Under these circumstances there always
exists some query q0 such that
(q0,d1) =
and
[OCRerr](q0,d2) [OCRerr] +*[OCRerr]
so that regardless of how close two document images are, they do not
belong to an equivalence class with respect to retrieval unless they are
in fact identical.
Under these circumstances it is clear that in order to reduce
the number of comparisons required in a retrieval operation, it will be
necessary to introduce some finite probability of error. Thus, since
the classification categories cannot be identified with equivalence
classes under matching functions of interest, a limited search strategy
may fail to retrieve some documents which would be retrieved by a full
search over the entire collection. The design of a classification
system, then, must involve a tradeoff between the total number of
comparisons. (search efficiency) and the probability of loss of relevant
documents (versus retrieval by a full search).
4. Classification and [OCRerr]etric Searching
The two previously considered metric query-document matching
functions did not lead t&an equivalence class partition of the
reference collection. Metric comparison measures do, however, have a