ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
A Modified Two-Level Search Algorithm Using Request Clustering
chapter
V. R. Lesser
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
VII-12
3) the set of non-associated documents is partitioned using the
standard clustering algorithm, and all loose documents are
associated with the nearest partition; this guarantees that
every document is included in at least one category; the
clusters of documents should be constructed in a sfriilar manner
as the cluster of documents used for the two-level search
scheme.
In the experimental program, the emphasis has been placed
on the various parameters which need to be adjusted since it is
necessary in order to validly compare the alternative search
procedures either to choose the set of parameters associated
with each search scheme so as to maximize effectiveness of the
search scheme for the test data base, or to define rules by which
it is possible to calculate the value of each parameter for any
data base.
D) Test Data Base
The following requirements must be met for the document and query
collection to be used to evaluate the effectiveness of the modified versus
the normal two-level search:
1) the collection of queries should be real user requests obtained
from an actual document retrieval system;
2) the collection of queries should be large enough so that information
dense subsets can exist among the queries;
3) relevance judgments should exist for at least a part of the query
collection (this provides a control sample of queries which
allows the testing of the modified versus normal two-level
search scheme for retrieval of relevant documents);
[OCRerr]) the document collection should contain dense areas of information;
othe[OCRerr]dse, the normal two-level search scheme cannot be efficiently
implemented.