ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
A Modified Two-Level Search Algorithm Using Request Clustering
chapter
V. R. Lesser
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
VII-is
set; therefore, a collection of 200 queries was constructed
to sinulate the first assumption. The idea motivating this
technique was to produce a query vector which was similar to
the initial query vector, but would possibly have different
concepts and weights. It was felt that this perturbation of
the initial query would sL[OCRerr]iulate a set of different users,
phrasing the same type of query.
2) the second collection of queries used to simulate the first
assumption consisted of the first set of 25 queries.
The data for the modified two-level search [OCRerr]Ta5 constructed by
sidering the two collections of queries described above as collections
of previous queries introduced into the system. The follo[OCRerr]ng procedures
were carried out for both collections of queries:
1) the standard clustering algorithm [OCRerr] used to partition the
set of previous queries into sets of 6 and 8 clusters;
2) the subset of associated documents for each query cluster was
constructed by associating all those documents which correlated
highly [OCRerr]rith the given query centroid vector; the size of the
associated subset of documents depended on the number of queries
contained in the given query cluster. [OCRerr] the size dependent
on the magnitude of the document correlations [OCRerr][OCRerr]th the centroid
vector was also tried, but for the document collection used the
associated subsets of documents turned out to have the same
size for either procedure. This procedure [OCRerr] then repeated
for the 6 and 8 clusters of queries.
3) the non-associated documents resulting from step 2) were
clustered into two categories; this was done so that the document
collection was partitioned intb sets of 8 and 10 clusters.
Therefore, the number of categories for the modified and normal
two-level search schemes were equal.