ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
The Query-Document Matching Function
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
4-17
par[OCRerr]i cular significance to automatic classification of the type being
considered. [OCRerr]he similarity measures:
c( (q,d) = 1 - n(g[OCRerr]d)
n([OCRerr]u d)
for set represented index images and where
e(q,d) = cos[OCRerr](q[OCRerr],d[OCRerr]), -1800[OCRerr]G[OCRerr] 180[OCRerr]
and
I[OCRerr]I Id
(4.1),
for vector represented index images are special cases of a class of
matching functions which possess the so-called "metric" property of
ordinary distance. A metric is characterized by:
(i) [OCRerr] ([OCRerr][OCRerr](?) = 0 iff X = while &([OCRerr]<,[OCRerr]) >0 if [OCRerr]
(`ii') [OCRerr] = [OCRerr] (symmetry)
(iii) [OCRerr](o(,[OCRerr]) + [OCRerr] [OCRerr] (triangle inequality).
(Since the index images of two distinct documents can in theory be
identical, the distance function in this case is more precisely
characterized as being a pseudo-metric, i.e. a function satisfying
metric properties (ii) and (iii) for which [OCRerr] = o.)
A metric matching function is well suited for automatic
classification since it[OCRerr] effectivel[OCRerr][OCRerr] introduces sufficient structure
on the index space to allow groups of related documents to be
identified. Since the same structure has been assumed for document