ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval The Query-Document Matching Function chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 4-20 where [OCRerr] is [OCRerr]efine[OCRerr] accor[OCRerr]ing to e[OCRerr]uation (4.1). The classification vector c (or more precisely its orientation) is the best single representation for all of the elements in the set D un[OCRerr]er the assumption that the information carrie[OCRerr] by an in[OCRerr]ex vector is containe& in its angular position. In the geometrical interpretation, the vector [OCRerr]c/Icl terminates at the centroi& of the point [OCRerr]istribution on the unit.N-sphere representing the vectors d[OCRerr]/. 1[OCRerr][OCRerr]1 In particular, then, if the elements in D are sufficiently close to one another, c must be close to all of them. With respect to the classification problem, if the members of D are tQ be groupe[OCRerr] into [OCRerr] classification category, c can be consi&ere[OCRerr] to *b& the best classification 11hea&" or repre[OCRerr]sentation for the category. This property of the centroi[OCRerr] vector together with the metric properties of angular query-&ocument matching will be use[OCRerr] as a basis for an automatic classification algorithm suitable for storage organization in the vector in[OCRerr]exing mo[OCRerr]el. 3. Description of the Classification Algorithm The objective of the classification process is to generate a set of categories or document subsets, each represented by a classifica- tion vector (equation; (4.2)) from the source collection. The properties of the classification system should result in increased search efficiency in a document retrieval system. The storage organization induced bya classification of this type leads to a two-level search algorithm. Consider `an input item"which is.to be compared with each member of a collection of N elements so that those elements which