ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
A Modified Two-Level Search Algorithm Using Request Clustering
chapter
V. R. Lesser
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
vii-8
these parameters are arbritary so that in order to validly compare
alternative search procedures, the parameters would have to be adjusted
to maximize the effectiveness of each search procedure. Therefore, a
different algorithm which is not a function of the number of clusters
nor the size of a cluster is used to calculate the number of documents
to be completely searched.
In order to generate the criterion for search effectiveness, the
normal procedure for querying a document collection is altered: instead
of considering a user request consisting of only a query together with
a cut-off value for the correlation coefficient (only documents which
correlate above the cut-off value are retrieved for each query) an
additional parameter is included. This parameter specifies the number of
documents to be retrieved. In this modified querying system, each search
procedure is altered so that when the specified number of documents are
retrieved, the search procedure terminate[OCRerr]. This modification permits
the comparison of the minimum number of documents each search procedure
must scan in order to satisfy the modified user request. There also must
be available some measure of the extent of relevance of the documents
retrieved by the alternative search procedures in relation to the documents
retrieved by a full search of the document collection.
Rocchio [6] in comparing the effectiveness of a two-level search
algorithm based on his clustering algorithm against the effectiveness of a
full search of the document collection uses the following criteria:
1) the `1consistency of retrieval [OCRerr]Tith respect to all documents,t1
i.e. the extent to which the reduced search leads to the
retrieval of the same documents as the full search;