ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
The Query-Document Matching Function
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
START
Initialize rs
Read inpu+ paramete
L___ __
A'ny;;[OCRerr]inflagged docs. left?
1 Yes ____________
I Select one and correlate it
I with all unclustered docs.
Test the density of hig[OCRerr]
1 correlating' ddcumeji[OCRerr]sI
t
No
Fail Pass
Save the initial
part of the sorted
correlation list
c¾'
L
Is the no. of categories
number
requested? Yes
`[OCRerr]can the saved corr. lists
for docs. still flagged
`1loose" to find the one
with max. unclustered
doc._density_________
Correlate this doc. with
all unclustered docs.
0
Form N partition classes,
one for each class. vector,
from the max. c6rr. list
0'
i[OCRerr] i + 1
Form centroid vector of
ith partition class _____
Correlate cewbtroid vector
with all documents
Derive cuto[OCRerr]f corr. and
assign docB. above cutoff
to the ith category
Flag the selected
document "loose'1
*1
¾;
Derive cutoff correlation
Form centroid vector for
subset above cutoff
Correlate centroid vector
with all docs. and sort
Derive new cutoff corr.
Flag docs. above cutoff
`1clustered'1
Update list of max. doc.-
class. vector correlations
Update the new list of max.
doc.-class,. vector cor[OCRerr]
No [OCRerr]Jsj,[OCRerr] [OCRerr]
* 4 Yes
Assign docs. not above any
cutoff by max. correlation
Print clasa. vectors and
categories
Flowchart of the Classification Algorithm
Figure 4.9