ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
The Query-Document Matching Function
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
4-2
collection of keywords, the comparison operation may take a variety of
forms, as summarized in Table 4.1. In general, selection of reference
documents based on an equality match between the query and document
set images is too restrictive for practical consideration. Partitioning
a reference collection into retrieved and rejected subsets by the
inclusion relati6n, however, has been used in man[OCRerr] practical retrieval
systems. In this case selected documents (members 0£ the retrieved
subset [OCRerr]) are defined by:
= %d[OCRerr][OCRerr] : d. [OCRerr] q
where d. and q are keyword sets, and [OCRerr] is a subset 0£ the source
1
collection 0£ document images D. Figure 4.1(b) illustrates this process
for the collection and query of part (a).
In many instances it is desirable to have the response of the
retrieval system be an assi[OCRerr]ent of values to all documents in the
collection, where the values reflect relevance[OCRerr]to the query. Both the
overlap and metric distance functions are typical of the matching
operations of this type. The overlap coefficient merely measures the
number 0£ common elements in the[OCRerr]two object sets, whereas the distance
function (developed by Rial, reference 1) induces a measure with the
metric properties of ordinary distance. Figure 4.1 ( (c) and, (d) )
provides an illustration of values assigned by each of these comparison
operations.
An extension 0£ the above matching operations on set-.
represented operands can be made by exploiting the isomorphism 0£ a