IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Test Environment
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
i-34
The comparisons in columns 4 and 5 use the normalized evaluation measures,
and in columns 3 and 9 the usual curves of fallout versus recall, and
precision versus recall are used. The entries in the table show, for ex-
ample, that for the ADI comparison the fallout versus recall curve produces
a superior performance for the specific requests, while the precision
versus recall curve shows the superiority to be with the general requests.
Any slight crossing of the curves is ignored in this table.
The entries in columns 2, 7 and 8, are explained by the example
given in Fig. 16. Using the IRE-3 results the detailed performance re-
sults of the specific and general requests are compared at five document
cut-off levels. It can be seen that although the general requests re-
trieve a greater number of relevant at each cut-off point compared with
the specific requests, a lower recall ratio is achieved each time since
with the general requests there are many more relevant to find. Also,
the general requests are seen to achieve better fallout and precision
ratios at each cut-off. Returning to columns 3 and 7 in Fig. 15, with
only one exception the precision versus recall curve shows the general
requests to be best, and the fallout versus recall curves all favor the
specific requests. The exception to this, noted in the case of IRE-l,
may be explained by[OCRerr]the fact that these 17 staff-prepared requests per-
form very much better than any other set for any of the collections, and
the useful length of these requests seems to offset the generality effect
which favors the general requests in the set.
This description reveals the difficulties involved in making this
type of test comparison. As is suggested in Section II, part 7 or 8,
user-oriented evaluation seems to be performed best by recognizing two