ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
The Query-Document Matching Function
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
4-1d~
index images as for query images, document-document distance is defined
and possesses the same properties of query-document distance. [OCRerr]
virtue of metric property (iii), the triangle inequality, a search
request which is close (related) to a given document d must necessarily
be close to all documents which are themselves close to d.
Let a set of documents D , grouped as a classi£ication
c
category, be confined to a region of the index space such that:
[OCRerr](d[OCRerr],c)< for all d. E D
1 c,
where c is an arbitrary vector in this region.
query q to the vector c be
[OCRerr](q,c) [OCRerr]
Let the distance from a
The metric properties of the distance function allow( the distance
between q and the members of I to be bounded as follows:
c
max [OCRerr] [OCRerr]o+ [OCRerr]`
for all diEDc[OCRerr] Thus the[OCRerr]single distancej(q,c) provides a bound on
the set of distances from q to the members of the document set I . The
c
following discussion is limited to the vector indexing model and
angular distance matching with the understanding that it is generally
applicable to any system employing a metric sImilarity measure.
In the vector model, document or query.index images are treated