ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval A Modified Two-Level Search Algorithm Using Request Clustering chapter V. R. Lesser Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. v'I-5 3. Advantages of the Query clustering System The modified two-level search algorithm should in practice be more efficient than the normal multi-level search, since f[OCRerr]Ter subsets of documents will have to be compared completely with the requests in the former. This may be expected to be true because a new query [OCRerr]nll not be matched against the centroid vectors of information dense subsets of documents, but against the centroid vectors of subsets of previous queries to the system. The matching of the new query against the centroid vectors of previous queries should cause a natural association of this new query with one particular subset of queries, and therefore fewer documents will have to be compared to satisfy the user's request. An illustration of this situation is shown in Figure 1. Obviously, in an information retrieval system where the document collection because of its size must be stored in external memory (e.g. dis[OCRerr], drum, data cell), the number of categories which need to be completely searched becomes a critical time factor. Each time a new category is searched, the documents which are associated with this category must be obtained from external storage. This data handling operation requires a considerable amount[OCRerr]of time, so that its minimization is an important factor in speeding up the retrieval process. o¼oxx,o[OCRerr] x xx xx xx(:xo OO[OCRerr]OX X)[OCRerr]XX>OXx[OCRerr] x;oox\) t[OCRerr]xX\OXI OXIA\x [OCRerr] w;[OCRerr])(O[OCRerr]xxo x represents a document o represents a query represents a document cluster represents a query cluster and the subset of documents which is associated with the query cluster Advantages of Query Clustering Figure 1