ISR11
Scientific Report No. ISR-11 Information Storage and Retrieval
A Modified Two-Level Search Algorithm Using Request Clustering
chapter
V. R. Lesser
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
VII-20
This evaluation procedure was programmed in Foftran for the CDC l6[OCRerr]
computer. Two additional evaluation criteria not previously mentioned
may also be calculated by this program:
1) C (q, C, n) equal to the number of document categories which
need to be scanned in order to satisfy the request; the quantities
CT, and C (c,n) can then be defined in a manner similar to that
used for and M (c,n);
2) the average correlation value for the set of test queries with the
highest correlating classification vector, the second highest
correlating classification vector, etc.
Figures 2 and 3 represent typical output of the evaluation program.
The six sets of categories together with their appropriate search schemes
are evaluated for search effectiveness using as test collections of queries
the first 25 queries, the last 10 queries, and the entire 35 queries. The
collection of the first 25 queries is intended to represent a set of queries
[OCRerr]hich is similar to the set of previous queries used to construct the four
sets of categories generateci by query clustering. The collection of the
last 10 queries is intended to represent a collection of new queries to the
syst[OCRerr][OCRerr] which may or may not be similar to the set of previous queries
introduced into the system. The entire collection of 35 queries represents
a composite collection of queries which provides an overall evaluation of
search effectiveness. Table 1 gives the value of the criteria for search
effectiveness for each set of categories and test [OCRerr]uery collections.