ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval The Query-Document Matching Function chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 4-17 par[OCRerr]i cular significance to automatic classification of the type being considered. [OCRerr]he similarity measures: c( (q,d) = 1 - n(g[OCRerr]d) n([OCRerr]u d) for set represented index images and where e(q,d) = cos[OCRerr](q[OCRerr],d[OCRerr]), -1800[OCRerr]G[OCRerr] 180[OCRerr] and I[OCRerr]I Id (4.1), for vector represented index images are special cases of a class of matching functions which possess the so-called "metric" property of ordinary distance. A metric is characterized by: (i) [OCRerr] ([OCRerr][OCRerr](?) = 0 iff X = while &([OCRerr]<,[OCRerr]) >0 if [OCRerr] (`ii') [OCRerr] = [OCRerr] (symmetry) (iii) [OCRerr](o(,[OCRerr]) + [OCRerr] [OCRerr] (triangle inequality). (Since the index images of two distinct documents can in theory be identical, the distance function in this case is more precisely characterized as being a pseudo-metric, i.e. a function satisfying metric properties (ii) and (iii) for which [OCRerr] = o.) A metric matching function is well suited for automatic classification since it[OCRerr] effectivel[OCRerr][OCRerr] introduces sufficient structure on the index space to allow groups of related documents to be identified. Since the same structure has been assumed for document