ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
A Modified Two-Level Search Algorithm Using Request Clustering
chapter
V. R. Lesser
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
vil-lo
by this cluster algorithm are:
1) the number of partitions (categories);
2) the minimum and maximum size of a category;
3) the parameters which define an acceptable category (the density
test).
Normal Two-Level Search:
The data required to carry out this process are c[OCRerr][OCRerr]pletely generated by
the application of the clustering algorithm to the document collection.
Certain documents called 1tloose" will not be classified into any category.
In order to have the same documents included in the set of document clusters
constructed for each search procedure, a loose document is associated with
that category whose centroid vector exhibits the maximum correlation with
the given loose document. The parameters that can be varied in the con-
struction of the document clusters for the normal two-level search are:
1) the number of document clusters;
2) the size of a cluster;
3) the parameters for the density test.
Modified Two-Level Search:
The implementation proceeds in three steps:
1) the collection of previous queries introduced into the system is
partitioned using the standard clustering algorithm, and all
loose queries are eliminated since they are statistically of no
consequence; the composition of the query clusters can be varied
in the following manner:
a) the nuniber of query clusters;