ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
A Modified Two-Level Search Algorithm Using Request Clustering
chapter
V. R. Lesser
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
vi'-6
Consider as an example, a new query (0) introduced into the system,
and assume that the new query lies between two document clusters. In the
normal two level search scheme both document clusters will therefore have
to be searched completely. However since the given query lies in a cluster
of previous queries the centroid vector of only one query cluster will
correlate highly with it. Therefore in the modified two-level search
scheme, only the subset of documents which is associated with the one
highly correlated query cluster needs to be searched completely. The
question now arises as to why a query ([OCRerr] which lies between two query
clusters could not occur with the same frequency as the query (0) used in
the ex[OCRerr]rnple. Two basic assumptions underlie the modified two level
search:
1) The set of previous query vectors to the system form dense subsets
in the n-dimensional space so that the set of previous queries
can be clustered into subsets based on the information content of
each query;
2) A new query is on a statistical basis likely to be similar to a
subset of [OCRerr]revious queries so that the new query can be assigned
to a subset of previous queries created by the clustering algorithm.
If these two assumptions are true, it is much less likely for a new
query to lie between two query clusters than between two document
clusters.