ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval The Query-Document Matching Function chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. with the one on which it is base[OCRerr]. Each of these final classification vectors is again correlate[OCRerr][OCRerr]with theentire [OCRerr]ocument collection to [OCRerr]efine the resultant set of categories. At this point a [OCRerr]ocument is associate[OCRerr] with a category if it is above the cutoff of the classifica- tion vector of that cate[OCRerr]ory, or if it is not above any cutoff but is closest to sai[OCRerr] classification vector. Figure 4.7(c) illustrates the partition class which results in the classification vector of Fi[OCRerr]tire 4.8(b); the correlation [OCRerr]istributidn of this vector, which specifies the final ca[OCRerr]gory, is shown in Figure 4.7(a). At the ena of the classification process, then, each classification vector represents all the aocuments with inaex vectors within the angular aistance corresponding to its cutoff correlation, and additionally, a few documents outside this radius. Documents of the latter typehowever, are closer t&the vectors to which they are assigned than to any others of the set. [OCRerr]ote that the final classification vectors are not necessarily the centroid vectors of the vector subset they represent since the final categories are not in general identical to the partition class from which the centroid vector was formed. However, the final categories generally contain the members of the partition class in[OCRerr]addition to documents which are * multiply classified. This strategy provides a convenient means for generating multiple classifications for some documents, while maintaining a set of categories balanced over the entire collection. Table 4.3 summarizes the main parts of the classification algorithm and an Qverall flowchart is given in Figure 4.[OCRerr]. [(¼