ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval The Query-Document Matching Function chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 4-19 as [OCRerr]- dimensional Cartesian vectors. Using the angi[OCRerr]l-ar distance similarity measure, it i[OCRerr] clear that a classification categQry should consist of a set of document images confined within a localized hyper- cone of the index space. Alternatively, if the index images are pictured as unit vectors terminating on the unit [OCRerr]-sphere, a classification category should consist of a set of documents represent[OCRerr][OCRerr] [OCRerr]index vectors terminating within some local area on the surface of the unit [OCRerr]-sphere In these terms the problem of automatic document classification is to define the characteristics of such areas and to establish a procedure for identifying and representing them. 5. A Heuristic ClassificationAlgoritbm A. Basic Concepts Associated with an arbitrary set of document index vectors D, a classification vector c is defined by the equation c =[OCRerr] Zn d[OCRerr] (4.2) i=1 where L = [OCRerr]d1,d2,... [OCRerr]dn} The vector c is the centrdid or center of gravity of the set of unit vectors d[OCRerr]/I d[OCRerr]l derived from the elements of D and represents, then, a vector with an orientation for which n F e(c,d[OCRerr]) = 0 i=1