IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Evaluation Parameters
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
11-21
for test purposes and meets the need of indicating a user-oriented view of
the result; the micro method on the other hand tends to give undue weight
to requests that have many relevant documents. As Salton and Rocchio have
shown [i,s] the macro method results in somewhat better precision recall
curves, but the difference between the two methods with current collections
and requests is near to or less than 5%, as seen in the comparison of Figure
II. An occasional use of the micro method has usually given the same perfor-
mance merit when two options are compared, so that this is[OCRerr]sue does not affect
comparative test results at all.
Further work on the averaging problem may reveal that the arithmetic
mean is not the only suitable method to use. Averaging is a problem simply because
of the extreme variance in individual results, as can be seen from the plot
of individual precision recall curves for 22 requests given in Figure 12.
The macro evaluation curve for these 22 requests is given in Figure 13,
together with a curve based on the median, rather than the mean. The scatter
of results raises the question of statistical significance; this matter
is discussed elsewhere [9].
B) Cut-off Techniques
Cut-off techniques in conventional manual and mechanized retrieval
systems usually depend on the search terms used, with specified term-matches
establishing the cut-off points. The equivalent in SMART is the use of the
correlation coefficient that is obtained between the request and each document,
but the provision of ranked output permits other cut-off criteria to be used,
specifically related to the exact nim[OCRerr]ber, or acceptability of the documents
as they are examined. Cut-off techniques for experimental purposes must be
based on methods applicable to all requests, regardless of variations in the
number of relevant items. For this reason the ranked output list only is used.