ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
The Query-Document Matching Function
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
4-~4
1. I[OCRerr]entify a &ense set of unclustered &ocument images.
2. Form the classification or centroi& vector for this
subset.
3. I[OCRerr]entify all aocuments in the vicinity of the
classification vector. Define a category by choosing a
cutoff, an[OCRerr] cluster [OCRerr]ocuments in the category.
Passes 1 an[OCRerr] 2
* 4. Partition the source collection on the basis of
association with the set of classification vectors forme[OCRerr]
above.
5. Form the classification or centroid vector for each
partition class[OCRerr].
6. Define the final set of categories for these classification
vectors by correlation with the &oc[OCRerr]1111ent collection an[OCRerr]
cutoff. Assign &ocuments below all cutoffs on the basis
of maximum association.
Pass 3
Summary of the Steps of the Classification Algorithm
Table 4.3