ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval A Modified Two-Level Search Algorithm Using Request Clustering chapter V. R. Lesser Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. Table 1: Search Schemes and Ca[OCRerr]egories Evaluated ror Search Effectiveness SEARCH SCHE[OCRerr] SET OF CATECOPIES USED TEST QUTRIES [OCRerr]rn R I 1. Normal T'To-Level 8 Categories, Based on Clustering Documents First 25 Queries 20.12 0.65 0.67 2. Modified [OCRerr]7o-Level 8 Categories, Based on Clustering Random Queries First 25 Queries 16.5h 0.70 0.55 3. Modified T'io-Level 8 Categories, Based on Clustering 25 Queries First 25 Queries 17.25 o.6y 0.60 [OCRerr]. Normal T'.?o[OCRerr]Level 10 Categories, Based on Clustering Documents First 25 Queries 22.17 0.[OCRerr] 0.66 5. Modified ¶[`[OCRerr]7o-Level 10 Categories, Based on Clustering Random Queries First 25 Queries 17.73 0.7h 0.63 6. Modified T'[OCRerr]o-Level 10 Categories, Based on Clustering 25 Queries First 25 Queries 18.56 0.72 0.60 7. Normal T'[OCRerr]-Level 8 Categories, Based on Clustering Documents Last 10 Queries 20.58 0.65 0.37 8. Modified [OCRerr]io-Level 8 Categories, Based on Clustering Random Queries Last 10 Queries 23.96 0.66 0.[OCRerr]5 9. Modified T'[OCRerr]-Level 8 Categories, Based on Clustering 25 Queries Last 10 Queries 25.52 0.63 0.57 LO. Normal [OCRerr][OCRerr]-Level 10 Categories, Based on Clustering Documents Last 10 Queries 22.38 0.63 o.h( Ll. Modified T'[OCRerr]o-Level 10 Categories, Based on Clustering Random Queries Last 10 Queries 23.39 0.70 0.52 L2. Modified [OCRerr]io-Level 10 Categories, Based on Clustering 25 Queries Last 10 Queries 26.70 o.6[OCRerr] 0.53 L3. Normal [OCRerr]7o-Level 8 Categories, Based on Clustering Documents Full 35 Queries 20.29 0.67 0.55 L[OCRerr]. Modified [OCRerr]o-Level 8 Categories, B[OCRerr][OCRerr]sed on Clustering Random Queries Full 35 Queries 18.75 0.68 0.55 L5. Modified T'[OCRerr]-Level 8 Categories, Based on Clustering 25 Queries Full 35 Queries 19.73 o.65 0.58 L6. Normal [OCRerr]7o-Level 10 Categories, Based on Clustering Documents Full 35 Queries 22.23 0.72 0.57 L7. Modified Two-Level 10 Categories, Based on Clustering Random Queries Full 35 Queries 19.hl 0.70 0.60 L8. Modified T'[OCRerr]-Level 10 Categories, Based on Clustering 25 Queries Full 35 Queries 21.01 [OCRerr] 0.60 MR = MT + (the average number of classification vectors which need to be correlated with tamed in the test collection of queries). against each classification vector so that each query con- The normal two-level search scheme always compares the query if 8 categories are used in fhe normal t[OCRerr].[OCRerr]-level search, MR MT + 8. The calculation of [OCRerr] for the modified two-lcvel search is more difficult since it is dependent on the number of categories of associated documents, the number of categoric of non-associated documents, and the particular collection of test queries. Consider the example of 8 categories based on the clustering of the first 25 queries, and the first 25 queries as a test[OCRerr]co1lection; the classification vectors for the subsets of associated documents were formed from the first 25 queries so that each of the first 25 queries correlates highly with at least one of these classification vectors. The modified two- level search scheme can then "satisfy't the request by correlating the query with only the set of classification vectors for the subsets of associated documents. Therefore in this example where there were 6 categories of associated documents, [OCRerr] = + 6. But, if the test collection of queries consisted of the last 10 queries, the set of classification vectors for the non-associated document subsets needs to be correlated with each query in the test collection in order to satisfy the request, and in this case Mp = MT + 8.