ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval A Modified Two-Level Search Algorithm Using Request Clustering chapter V. R. Lesser Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. VII-13 5. Actual Comparisons of the Modified versus thc iTo[OCRerr]nal r.ro-Level Searches :n the previous p[OCRerr]irts, a method of comparing the alternative search procedures was outlined. The method of comparison actually used did not fully follow the suggested method since: 1) a collection of queries created by an actual user population was not available, and further, the a[OCRerr][OCRerr]ilable query collection consisted of only 35 queries; [OCRerr]) the collection of documents avail[OCRerr]ble for these queries consisted of only 82 documents from the ADI collection; 3) 50 many parameters were involved in implementing each search 1[OCRerr]rocedure th[OCRerr] an ad[OCRerr]quate appraisal would have required an excessive amount of computer time. In the framework of this lL'r[OCRerr]ted data base, the follo[OCRerr][OCRerr]ng procedure .[OCRerr]s actually used: A) T)ata Generated for [OCRerr]`o-Leve' Search Algorithm The standard elusterir[OCRerr] algorithm [OCRerr] used to partition the collection of 82 documents into 8 clusters, and 10 clusters; each category (cluster) was approximately equal in size. Attempts to divide the document collection into more than 10 clusters were unsuccessful. The number of categories used is not purely arbritrary, since Rocchio [6] ?roves that if each document has the same probability of being relevant to the query and the categories are approximately equal in size, then the optimum number of categories is equal to$K.82, where K is the number of categories which must be searched. If K = 1, then 9 categories should be used so that the sets of 8 and 10 categories are not unreasonable.