ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
A Modified Two-Level Search Algorithm Using Request Clustering
chapter
V. R. Lesser
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
VII-3
It is felt that the partitioning of the document collection by grouping
documents containing similar information identifiers does not always
maximize the efficiency of the multi-level search. This technique of
partitioning is effective when the set of queries introduced into the
system can be divided into groups of queries which roughly correspond in
information content to the subsets of documents previously created by the
clustering algorithm. If this is not the case, the set of relevant documents
for a query will be spread over many document subsets, and the multi-level
search will not prove effective In practice, it is believed that the
distribution of the information content of the queries may often differ
si[OCRerr]nificantly from that of the document collection. [OCRerr]`urthermore, if this
contention is correct, a more efficient classification scheme can be
constructed by considering the information content of queries previously
introduced into the system.
In the next few paragraphs, new techniques are described for partitio-
ning the document collection, and for carrying out the multi-level search,
in accordance with the query set previously introduced into the system,
as well as a possible modification of this technique of partitioning based
on relevance judgments provided by the user.
2. A Modified Clustering Algorithm and a Corresponding
Twd-Level Search Strategy
It is desired to construct clusters of documents as a function of
both the collection of documents, and also the collection of previous
queries introduced into the system. The procedure for clustering is
divided into three stages: