ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval A Modified Two-Level Search Algorithm Using Request Clustering chapter V. R. Lesser Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. v'I-25 D) [OCRerr]valuation Results The improvement in search efficiency by query clustering can be observed in cases 1-6 in Table 1. In all of these cases, the search efficiency as measured by,[OCRerr] indicates that the modified two-level search based on query clustering is significantly better than the normal two-level search scheme based on document clustering. The reasons for this improvement in search efficiency can be explained by Table 2: the classification vectors of the categories constructed by query clustering are more highly correlated with the test queries; and they more naturally classify the test query to one particular category. This is indicated by the large differences between the first and second highest correlating classification vectors. These results provide an experL'nental validation of the theoretical advantages of query clustering as illustrated by Figure 1. Unfortunately, the other two criteria [OCRerr]T' [OCRerr]T [OCRerr] contradict the general feeling that the higher the query-document correlations (and therefore the larger the value of the greater the probability of retrieving relevant documents (and therefore the larger the value of R[OCRerr]). A positive conclusion based on all three criteria for search effectiveness is thus impossible. Still, it is evident that case 5, `Yhich is an example of the modified t[OCRerr][OCRerr]-ievel search scheme, is superior to the two examples of the normal two-level search scheme; the values of and [OCRerr]T for case 5 are much better than for case 1 and case and the differences in the values of 1% for these three cases are small. The apparent contradiction caused by differences between the values of P and R[OCRerr] can be resolved if the evaluation results are based only on T requests which retrieve more than three documents. It appears that for requests which retrieve only 3 documents, high overlap be[OCRerr]een the first 3