Scientific Report No. IRS-13 Information Storage and Retrieval

IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Evaluation Parameters chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. II- 37 The fifth method uses an extrapolation at constant precision, that is, the precision ratio of the first relevant document retrieved is held constant as the curve is extrapolated to 0.0 recall. Figure 22 includes [OCRerr]he four examples for this method. This method has the best documentary interpretation from a user viewpoint, since intermediate points on the extrapolated p[OCRerr]rt of the curve do give an accurate precision ratio that can be achieved at low recall value in cases a) and b), and in cases c) and d) this extrapolation seems to be fairer for averaging purposes than any of methods 2 to [OCRerr]. This does mean that the precision value at low recall is dependent on the precision achieved when the first relevant document is encountered, and a later relevant document may give slightly higher precision (as in Figure 22 case b)); usually, the extrapolation is sensible. The foregoing discussion of different techniques for extrapolation is partly an academic one, since in the test comparisons made within SMART comparative meri[OCRerr] will not be affected by choice of extrapolation method when the request set is unaltered. Method 3, which has been used in runs made at Harvard, does not correctly indicate merit at the left end of the curve if comparisons involving changes in request sets, or average generality are to be made. For example, three h[OCRerr]pothetical requests with differing numbers of relevant items are seen in Figure 23 a) to be badly served by this method at say 0.2 recall, where merit of the three requests is really the reverse of the fact. For this reason, it is preferable that in further work extra- polation method 5 be used. A comparison of methods 3 and 5 is made in Figure 23 b), showing that the difference in curves averaged by a recall level (11Quasi-Cranfield't) cut-off is quite small except at the high precision end. If it is thought important to know, at each recall level on the curve, how many of the requests were averaged using an extrapolated part of the individual curves, and how many have enough relevant items to actually enter the average