ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
A Modified Two-Level Search Algorithm Using Request Clustering
chapter
V. R. Lesser
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
vII-26
documents retrieved in a ruii search does not necessarily imply a high
racall for the 3 document[OCRerr] as measured by R . Table 3 sh[OCRerr][OCRerr] the values
T
for [OCRerr]` [OCRerr]T and [OCRerr] based on requests which retrieve more than 3 documents.
It is felt that this chan[OCRerr]e in the range of the number of documents to be
retrieved does not invalidate the conclusions [OCRerr]ince in an actual informa-
tion retrieval system, the majority of the user requests retrieve more than
3 documents. The results exhibited in Table 3 clearly indicate that if the
collection of ne[OCRerr], queries introduced into the system is similar to the
collection of previous queries introduced into the system, then the modi-
fied t[OCRerr]o-level search scheme is more efficient than the normal two-level
scarch~.
The cases 7-12 in Table 1 indicate, as expected, that if a query is
not simalar to a subset of previous queries, the normal two-level search
is more effective than the modified two-level search. It is believed that
due to an unanticipated error in the experimental procedures this difference
in search effectiveness is unduly increased. The set of non-associated
documents was found to be approximately equal in size to the set of
associated documents, and [OCRerr]jas partitioned into only 2 categories. It is
felt in retrospect that this -[OCRerr]s a mistake, and that if the set of non-
associated documents had been clustered into [OCRerr] categories, the search
effectiveness of the modified two-level search scheme for the test collec-
tion of the last 10 queries would have been greatly improved. Four
categories divides this set of non-associated documents into subsets of
documents of the same size as the clusters of associated documents.
The collection of 35 test queries is converted into 231 satisfied
requests of which 160 requests were produced from the first 25 queries.