ISR11 Scientific Report No. ISR-11 Information Storage and Retrieval A Modified Two-Level Search Algorithm Using Request Clustering chapter V. R. Lesser Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. VII-l~ B) Data Generated for Modified T'ro-Level Search Algorithm [OCRerr]o as[OCRerr]umptions are used as the basis for the modified two-level search algorithm: 1) a new query introduced into the system will on a statistical basis be similar to a set of previous queries introduced into the system; 2) a more efficient classification scheme can be constructed if assumption 1) is correct. It is obvious that 35 queries do not give any indication concerning the truth of the first assumption. In order to carry out the experiment, it was decided to assume the correctness of the first assumption, and to determine instead whether the first assumption implied the second assumption. [OCRerr]TO techniques were used to generate a collection of queries which simu- lated the first assumption, using in each case the 35 queries partitioned into two sets, the first consisting of 25 queries and the second of 10 queries to be used as a control: 1) for each query in the first set, eight random query vectors were generated whose correlations with the initial query were above 0.7. A random query was generated by correlating the initial query with the whole document collection; the vectors representing the two highest correlated documents were s[OCRerr]uiin[OCRerr]ed together with the initial query vector; each concept in this summed vector was then multiplied by a different random number from 0 to 1. This new vector was normalized and correlated with the initial query; if this correlation was greater than 0.7, then this random vector was added to the query collection. This procedure was used until 8 vectors were generated for each query in the first