ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
A Modified Two-Level Search Algorithm Using Request Clustering
chapter
V. R. Lesser
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
v'I-5
3. Advantages of the Query clustering System
The modified two-level search algorithm should in practice be more
efficient than the normal multi-level search, since f[OCRerr]Ter subsets of
documents will have to be compared completely with the requests in the
former. This may be expected to be true because a new query [OCRerr]nll not be
matched against the centroid vectors of information dense subsets of
documents, but against the centroid vectors of subsets of previous queries
to the system. The matching of the new query against the centroid vectors
of previous queries should cause a natural association of this new query
with one particular subset of queries, and therefore fewer documents will
have to be compared to satisfy the user's request. An illustration of
this situation is shown in Figure 1.
Obviously, in an information retrieval system where the document
collection because of its size must be stored in external memory (e.g. dis[OCRerr],
drum, data cell), the number of categories which need to be completely
searched becomes a critical time factor. Each time a new category is
searched, the documents which are associated with this category must be
obtained from external storage. This data handling operation requires a
considerable amount[OCRerr]of time, so that its minimization is an important
factor in speeding up the retrieval process.
o¼oxx,o[OCRerr] x
xx xx xx(:xo OO[OCRerr]OX X)[OCRerr]XX>OXx[OCRerr] x;oox\)
t[OCRerr]xX\OXI OXIA\x
[OCRerr] w;[OCRerr])(O[OCRerr]xxo
x represents a document
o represents a query
represents a document cluster
represents a query cluster and the subset of documents which
is associated with the query cluster
Advantages of Query Clustering
Figure 1