ISR10 Scientific Report No. ISR-10 Information Storage and Retrieval The Query-Document Matching Function chapter Joseph John Rocchio Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. START Initialize rs Read inpu+ paramete L___ __ A'ny;;[OCRerr]inflagged docs. left? 1 Yes ____________ I Select one and correlate it I with all unclustered docs. Test the density of hig[OCRerr] 1 correlating' ddcumeji[OCRerr]sI t No Fail Pass Save the initial part of the sorted correlation list c¾' L Is the no. of categories number requested? Yes `[OCRerr]can the saved corr. lists for docs. still flagged `1loose" to find the one with max. unclustered doc._density_________ Correlate this doc. with all unclustered docs. 0 Form N partition classes, one for each class. vector, from the max. c6rr. list 0' i[OCRerr] i + 1 Form centroid vector of ith partition class _____ Correlate cewbtroid vector with all documents Derive cuto[OCRerr]f corr. and assign docB. above cutoff to the ith category Flag the selected document "loose'1 *1 ¾; Derive cutoff correlation Form centroid vector for subset above cutoff Correlate centroid vector with all docs. and sort Derive new cutoff corr. Flag docs. above cutoff `1clustered'1 Update list of max. doc.- class. vector correlations Update the new list of max. doc.-class,. vector cor[OCRerr] No [OCRerr]Jsj,[OCRerr] [OCRerr] * 4 Yes Assign docs. not above any cutoff by max. correlation Print clasa. vectors and categories Flowchart of the Classification Algorithm Figure 4.9