ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
A Modified Two-Level Search Algorithm Using Request Clustering
chapter
V. R. Lesser
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
vii-4
1) the collection of previous queries introduced into the system
is partitioned into subsets of queries using a standard clustering
algorithm; [1,6]
2) an associated subset of documents is formed for each subset of
queries constructed in step 1); the associated subset of documents
consists of all documents which are highly correlated with at
least one query contained in the subset of queries;
3) all documents which are not associated with any query cluster
by step 2) are divided into subsets using a standard clustering
algorithm.
The multi-level search previously described is then modified to take
into account this new request clustering procedure. The new modified two-
level search algorithm uses the following procedure: the new query is
correlated against the centroid vectors of the cluster subsets of
previous queries; if the new query correlates highly with at least one of
the query centroid vectors, the query is matched against each document
contained in the associated subsets of documents corresponding to each
highly correlated query centroid vector; otherwise, the new query is
matched against the centroid vectors of the subsets of non-associated
documents constructed in step 3); for those subsets whose centroid vector
correlates highly with the query, the query is matched against every docu-
ment contained in the subset.
This new clusterin& algorithm can be further modified by incorpora-
ting user relevance judgments for each previous query introduced into
the system. In step 2), instead of associating all those documents
which were identified by their high correlation, it is possible to
associate only those documents considered relevant to the query by the user.