ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
A Modified Two-Level Search Algorithm Using Request Clustering
chapter
V. R. Lesser
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
VII-l~
B) Data Generated for Modified T'ro-Level Search Algorithm
[OCRerr]o as[OCRerr]umptions are used as the basis for the modified two-level search
algorithm:
1) a new query introduced into the system will on a statistical
basis be similar to a set of previous queries introduced into
the system;
2) a more efficient classification scheme can be constructed if
assumption 1) is correct.
It is obvious that 35 queries do not give any indication concerning
the truth of the first assumption. In order to carry out the experiment,
it was decided to assume the correctness of the first assumption, and to
determine instead whether the first assumption implied the second assumption.
[OCRerr]TO techniques were used to generate a collection of queries which simu-
lated the first assumption, using in each case the 35 queries partitioned
into two sets, the first consisting of 25 queries and the second of 10
queries to be used as a control:
1) for each query in the first set, eight random query vectors were
generated whose correlations with the initial query were above
0.7. A random query was generated by correlating the initial
query with the whole document collection; the vectors representing
the two highest correlated documents were s[OCRerr]uiin[OCRerr]ed together with
the initial query vector; each concept in this summed vector was
then multiplied by a different random number from 0 to 1. This
new vector was normalized and correlated with the initial query;
if this correlation was greater than 0.7, then this random
vector was added to the query collection. This procedure was
used until 8 vectors were generated for each query in the first