IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Evaluation Parameters
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
II-2~
and no account is taken for cut-off purposes of the correlation, although
one study using correlation magnitudes has been made [10].
using the precision recall pairs that can be computed as each document
in the output list is examined ([OCRerr]igure [OCRerr]), three cut-off methods seem feasible.
The first method is to obtain average curves from all requests just as drawn
in Figure 14, by computing mean precision recall pairs for each document cut-
off level. If done by hand, the cut-off points may be recorded on the curve
as in Figure 114 a), or a computer-produced average may be used which produces
precision at ten recall levels for plotting convenience, Figure i14 b). This
technique is referred to as the 11pseudo-Cranfield'1method, and although it is
available for many runs it is not generally used for SMART evaluations. One
advantage of this method is that is seems to be fully user-oriented, since the
plot of Figure 114 a) shows how many documents a typical user must examine to
get IXI? recall. Another advantage is that computation does not depend on
the interpolation and extrapolation techniques that are required for the other
methods to be described. A disadvantage stems from the fact that the re-
quests vary according to the number of relevant items so that if one of the
requests has only a single relevant document, any cut-off made at 2 or more
documents will not give 1.0 precision even if all requests have a quite per-
fect performance. One simple solution to this is to give the theoretical
best possible curve for a given set of requests, as is done in Figure 114 a).
It is a simple matter to use this cut-off method with macro evaluation, as
the macro curve in Figure U was obtained this way.
The second and third cut-off techniques use, respectively, precision
and recall ratios to determine the cut-off points at which averages will be
computed. A set of precision or recall values are picked in advance, and
requests are averaged essentially at the cut-off points at which the required