ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
A Modified Two-Level Search Algorithm Using Request Clustering
chapter
V. R. Lesser
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
VII-9
2) the [OCRerr] of retrieval with respect to relevant documents,
i.e. the extent to which the retrieval of the relevant documents
iz altered by the reduced search.
The above criteria are based on the amount of information lost when
the documents are retrieved by a partial search of the document collection
instead of by a full search. It is believed that in conjunction the two
criteria for effectiveness provide adequate data for an appraisal of the
modified two-level search scheme compared with the normal two-level search
scheme.
In the modified querying system proposed for testing, Rocchio1s two
criteria take the follo[OCRerr][OCRerr]ng form:
1) the overlap percentage between the retrieved set of documents
*
obtained by the partial search [OCRerr][OCRerr]th the first n documents
retrieved by the full search;
2) the normal recall or the percentage of relevant documents
retrieved by the partial search to the number of relevant
documents contained in the first n* documents retrieved by
the full search.
C) Implementation of the Normal and Modified Two-Level
Search Schemes
Each search procedure relies heavily on the particular clustering
algorithm used, and the parameters used by the cluster algorithm to
determine how the document collection is to be partitioned. It [OCRerr]
decided, based on a search of the literature, that Rocchio's clustering
algorithm [6] would be the most suitable. The parameters that are used
*
n = the ni[OCRerr]ber of documents to be retrieved originally specified by
the user for the partial search of the document collection.