IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Evaluation Parameters
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
11-13
that cut-off point. Other similar examples can be constructed, and the
conclusion is that the normalized "sliding ratio" measure does not include
many valuable features.
The `1operating characteristic1t curves used by John Swets [3] use
a graph of the recall ratio and the fallout ratio, described by him in terms
of probabilities. The fallout ratio has been used in previous experiments
[2], and is discussed in Part 6. Swets uses this measure because the operating
characteristic curves may be examined in terms of statistical decision theory,
and, hopefully, a single number measure may be derived to represent the whole
curve, if the curves foliow some suitable theoretical model. Some results
from SMART and other experimental tests are used by Swets, but the resulting
fit with the model curves is only partially successful, in that an 11511 value
as weli as an 11E11 value are strictly required to characterize an operating
characteristic curve, as shown in Figure 7. It should be noted that although
this kind of measure is suitable for reflecting the system efficiency viewpoint,
and meets nearly perfectly properties 1, 2, 3, [OCRerr], and 7, it does not and cannot
display user satisfaction in terms of precision, and therefore does not meet
properties 5 and 6 (Figure 2).
C) Comparison of Single N[OCRerr]ber and Curve Measures
The relationship between the single nim[OCRerr]ber normalized measures
on the one hand, and the precision recall curve on the other has not yet been
theoretically established. Both types of measures are obtained for every
retrieval run, and in the vast major[OCRerr]y of cases the two types of measure
give the same merit when two runs are being c[OCRerr]ared for effectiveness.
For example, the two average sets of results given in Figure 5 show that both
normalized recall and normalized precision favor the Abstracts, Thesaurus
option, and the same result is given by the precision recall curve, since