ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval The Query-Document Matching Function chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 4-~4 1. I[OCRerr]entify a &ense set of unclustered &ocument images. 2. Form the classification or centroi& vector for this subset. 3. I[OCRerr]entify all aocuments in the vicinity of the classification vector. Define a category by choosing a cutoff, an[OCRerr] cluster [OCRerr]ocuments in the category. Passes 1 an[OCRerr] 2 * 4. Partition the source collection on the basis of association with the set of classification vectors forme[OCRerr] above. 5. Form the classification or centroid vector for each partition class[OCRerr]. 6. Define the final set of categories for these classification vectors by correlation with the &oc[OCRerr]1111ent collection an[OCRerr] cutoff. Assign &ocuments below all cutoffs on the basis of maximum association. Pass 3 Summary of the Steps of the Classification Algorithm Table 4.3