ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval A Modified Two-Level Search Algorithm Using Request Clustering chapter V. R. Lesser Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. VII-l7 constant values for all queries would bias the conclusions of the experiment. In such a case, the conclusions might only be valid for an infor[OCRerr]ation retrieval system where the needs are maximized by the particular set of constants chosen for the experiment. Further, there may be requests consisting of a query together with fixed parameter values which are not satisfied by the test document collection. * In this case, the given query would be useless in the evaluation procedure. Accordingly, it was decided that a systematic variatiation of these two parameters for each query would constitute the best approach, since the effect of varying the two parameters on alternative search schemes could then be observed, and average values could be obtained for the criteria over the entire range of these parameters. Each parameter was in fact varied in the following manner: 1) the value for the cut-off correlation [OCRerr] made to range from 0.2 to 0.6 in increments of 0.1; 2) the value for the nun[OCRerr]ber of documents to be retrieved [OCRerr]ias made to range from 3 to 12 in increments of 3. In this framework, the data for evaluating the search effectiveness of a given search scheme and a given set of classification vectors with their associated document subsets is generated as follows: for each query (q) contained in the collection of test queries, a set of 20 search requests is constructed by a systematic variation of the second and third parameters as previously described; the following search requests represented as * A user request is considered as a triplet: (q, c, n), where q is a query index vector, c is the correlation cut-off value, and n is the number of documents to be retrieved; a request is 11satisfied't by the given document collection if there exist at least n documents in the document collection whose correlation coefficient with the query q is above c.