ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval A Modified Two-Level Search Algorithm Using Request Clustering chapter V. R. Lesser Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. v'I-2 full search technique must be discarded. Instead, a multi-level search algorithm based on a partial scan of the document collection can be used. Such a scheme can be based on a clustering algorithm which uses the information content of the documents to partition the document collection into subsets of related documents. The following procedure can then be used to perform a multi-level search based on the previously identified document clusters: 1) the procedure finds those subsets of documents whose represen- tative centroid vectors* are significantly correlated with the given query; 2) the query is then matched against each document contained in the subsets of documents found in step 1). The basic assumption in the multi-level search is that for each new query introduced into the system, the documents which are relevant to this query are contained in only a few of the document clusters. ?Llrther, the centroid vectors of these particular clusters will correlate more highly with the query than the centroid vectors of the document clusters which do not contain any documents relevant to the query. Therefore, the efficiency of the multi-level search is dependent on the number of clusters which contain relevant documents, and on the correlation of the query with the centroid vectors of.relevant document clusters. * (let each document be represented by an n- dimensional index vector; consider a set D of document index vectors; the centroid vector c for the set D is defined as: ln W[OCRerr] n i=l dii where D = ([OCRerr]d1,d2.......