IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Evaluation Parameters
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
II- 37
The fifth method uses an extrapolation at constant precision, that is,
the precision ratio of the first relevant document retrieved is held constant
as the curve is extrapolated to 0.0 recall. Figure 22 includes [OCRerr]he four examples
for this method. This method has the best documentary interpretation from a
user viewpoint, since intermediate points on the extrapolated p[OCRerr]rt of the curve
do give an accurate precision ratio that can be achieved at low recall value
in cases a) and b), and in cases c) and d) this extrapolation seems to be
fairer for averaging purposes than any of methods 2 to [OCRerr]. This does mean that
the precision value at low recall is dependent on the precision achieved when
the first relevant document is encountered, and a later relevant document
may give slightly higher precision (as in Figure 22 case b)); usually, the
extrapolation is sensible.
The foregoing discussion of different techniques for extrapolation
is partly an academic one, since in the test comparisons made within SMART
comparative meri[OCRerr] will not be affected by choice of extrapolation method
when the request set is unaltered. Method 3, which has been used in runs
made at Harvard, does not correctly indicate merit at the left end of the curve
if comparisons involving changes in request sets, or average generality are
to be made. For example, three h[OCRerr]pothetical requests with differing numbers
of relevant items are seen in Figure 23 a) to be badly served by this method
at say 0.2 recall, where merit of the three requests is really the reverse
of the fact. For this reason, it is preferable that in further work extra-
polation method 5 be used. A comparison of methods 3 and 5 is made in Figure
23 b), showing that the difference in curves averaged by a recall level
(11Quasi-Cranfield't) cut-off is quite small except at the high precision end.
If it is thought important to know, at each recall level on the curve, how
many of the requests were averaged using an extrapolated part of the individual
curves, and how many have enough relevant items to actually enter the average