ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval The Query-Document Matching Function chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 4-2 collection of keywords, the comparison operation may take a variety of forms, as summarized in Table 4.1. In general, selection of reference documents based on an equality match between the query and document set images is too restrictive for practical consideration. Partitioning a reference collection into retrieved and rejected subsets by the inclusion relati6n, however, has been used in man[OCRerr] practical retrieval systems. In this case selected documents (members 0£ the retrieved subset [OCRerr]) are defined by: = %d[OCRerr][OCRerr] : d. [OCRerr] q where d. and q are keyword sets, and [OCRerr] is a subset 0£ the source 1 collection 0£ document images D. Figure 4.1(b) illustrates this process for the collection and query of part (a). In many instances it is desirable to have the response of the retrieval system be an assi[OCRerr]ent of values to all documents in the collection, where the values reflect relevance[OCRerr]to the query. Both the overlap and metric distance functions are typical of the matching operations of this type. The overlap coefficient merely measures the number 0£ common elements in the[OCRerr]two object sets, whereas the distance function (developed by Rial, reference 1) induces a measure with the metric properties of ordinary distance. Figure 4.1 ( (c) and, (d) ) provides an illustration of values assigned by each of these comparison operations. An extension 0£ the above matching operations on set-. represented operands can be made by exploiting the isomorphism 0£ a