ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
A Modified Two-Level Search Algorithm Using Request Clustering
chapter
V. R. Lesser
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
vil-il
b) the size of each query cluster;
c) the criterion for an acceptable query cluster.
2) the document collection is partitioned into a set of associated
and non-associated documents based on the query clusters in
step 1); the fozmtion of subsets of associated documents for
each given query cluster can take place in one of three ways:
a) the associated subset of documents for the given query
cluster is formed by associating documents which correlate
highly with a query contained in the given query cluster;
b) the associated subset of documents for the given query
cluster is formed by associating documents which correlate
highly with the centroid vector of the given query cluster
c) the associated subset of documents for the given query
cluster is formed by associating documents which are
judged by a user relevant to his query contained in the
given query cluster.
The size of the subsets of associated documents depends on what
is meant by !Y[OCRerr][OCRerr]gh[OCRerr]y correlated1'; the size can be determined in
one of two ways:
a) the size depends on the number of queries contained in the
given query cluster in the sense that the greater the
number of queries contained in a query cluster the greater
the number of documents that are associated with a query
cluster (this method of determining the size of the
associated categories is rationalized by the e[OCRerr]ectation
that certain areas of information will more often contain
relevant documents for the query, so that these information
areas should be larger);
b) the size depends on the density of the documents which
surround the query in the n-dimensional space; that is,
the higher the correlation of the documents with the
query the more documents are associated with the query.