ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
A Modified Two-Level Search Algorithm Using Request Clustering
chapter
V. R. Lesser
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
VII-l7
constant values for all queries would bias the conclusions of the experiment.
In such a case, the conclusions might only be valid for an infor[OCRerr]ation
retrieval system where the needs are maximized by the particular set of
constants chosen for the experiment. Further, there may be requests
consisting of a query together with fixed parameter values which are not
satisfied by the test document collection. * In this case, the given query
would be useless in the evaluation procedure. Accordingly, it was decided
that a systematic variatiation of these two parameters for each query
would constitute the best approach, since the effect of varying the two
parameters on alternative search schemes could then be observed, and
average values could be obtained for the criteria over the entire range of
these parameters. Each parameter was in fact varied in the following
manner:
1) the value for the cut-off correlation [OCRerr] made to range from 0.2
to 0.6 in increments of 0.1;
2) the value for the nun[OCRerr]ber of documents to be retrieved [OCRerr]ias made to
range from 3 to 12 in increments of 3.
In this framework, the data for evaluating the search effectiveness of
a given search scheme and a given set of classification vectors with their
associated document subsets is generated as follows: for each query (q)
contained in the collection of test queries, a set of 20 search requests
is constructed by a systematic variation of the second and third parameters
as previously described; the following search requests represented as
*
A user request is considered as a triplet: (q, c, n), where q is a query
index vector, c is the correlation cut-off value, and n is the number of
documents to be retrieved; a request is 11satisfied't by the given document
collection if there exist at least n documents in the document collection
whose correlation coefficient with the query q is above c.