ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval The Query-Document Matching Function chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 4-2[OCRerr] subset of in[OCRerr]ex vectors for category formation are base& on the number of elements in the subset[OCRerr]as well as the mutual distance among the elements. Under these conditions a region of the index space with a high density ofdocument vectors will yield categories in which all the documents are closely related (via' the distance function) whereas in regions of relatively[OCRerr]low density, categories covering a wider scope will be formed; [OCRerr]ote'that'as the mutual `distance among the members of a classification category increases, the classification vector becomes less representative of the group as a whole. There is therefore a definite tradeoff in category formation between producing categories of equal population on the one hand', and maintaining control of the distance relation among category members on the other. Control of the' classification categories is achieved by a set of input parameters to the algorithm which specify: 1. The number of categories desired 2. A lower `and upper bound on the number of elements to be included in any classification subset 3. An upper bound on the' distance (lower bound on the correlation coef'ficient) between a document and a classificatibn"vector such that the document is still considered td be associated with that vector. In the' course of the classification process each document may be associate'd w'ith o'ne'of `tbree possible states. Initially, all documents are' `con'side're'd `to be Itunc'lusteredll, implying that they have' not `[OCRerr]een assign'ed to `[OCRerr]ny `classifidati9n category, nor is anything (