ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
A Modified Two-Level Search Algorithm Using Request Clustering
chapter
V. R. Lesser
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
vii-i6
By these procedures, 6 different sets of classification vectors and
associated documents were constructed:
1) 8 categories,
2) 8 categories,
3) 8 categories,
[OCRerr]) 10 categories,
5) 10 categories,
6) 10 categories,
based on clustering documents;
based on clustering of 200 random queries;
based on clustering of the first 25 queries;
based on clustering documents;
based on clustering of 200 random queries;
based on clustering of the first 25 queries.
C) Experimental Evaluation
The 6 sets of classification vectors together with their associated
document subsets are used with the sample collection of 35 queries to compare
the search efficiency of the modified two-level search scheme with that of
the normal two-level search process.
In order to generate the criteria for search effectiveness, the normal
mode of querying is altered; the user's request in this modified quez[OCRerr]ring
system consists of a query, a value for the correlation cutoff, and a
value for the number of documents to be retrieved. Therefore, the sample
collection of queries cannot be used directly as test data (i.e. user
requests) in the modified querying systenL since for each query neither the
value for the correlation cut-off, nor the value for the number of documents
to be retrieved is specified. In an actual information system, these two
parameters vary according to the needs of the particular user (e.g. the set
of parameters for high recall differs from those for high precision).
Therefore, it is felt that the assignment to these two parameters of