Scientific Report No. IRS-13 Information Storage and Retrieval

IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Evaluation Parameters chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 11-13 that cut-off point. Other similar examples can be constructed, and the conclusion is that the normalized "sliding ratio" measure does not include many valuable features. The `1operating characteristic1t curves used by John Swets [3] use a graph of the recall ratio and the fallout ratio, described by him in terms of probabilities. The fallout ratio has been used in previous experiments [2], and is discussed in Part 6. Swets uses this measure because the operating characteristic curves may be examined in terms of statistical decision theory, and, hopefully, a single number measure may be derived to represent the whole curve, if the curves foliow some suitable theoretical model. Some results from SMART and other experimental tests are used by Swets, but the resulting fit with the model curves is only partially successful, in that an 11511 value as weli as an 11E11 value are strictly required to characterize an operating characteristic curve, as shown in Figure 7. It should be noted that although this kind of measure is suitable for reflecting the system efficiency viewpoint, and meets nearly perfectly properties 1, 2, 3, [OCRerr], and 7, it does not and cannot display user satisfaction in terms of precision, and therefore does not meet properties 5 and 6 (Figure 2). C) Comparison of Single N[OCRerr]ber and Curve Measures The relationship between the single nim[OCRerr]ber normalized measures on the one hand, and the precision recall curve on the other has not yet been theoretically established. Both types of measures are obtained for every retrieval run, and in the vast major[OCRerr]y of cases the two types of measure give the same merit when two runs are being c[OCRerr]ared for effectiveness. For example, the two average sets of results given in Figure 5 show that both normalized recall and normalized precision favor the Abstracts, Thesaurus option, and the same result is given by the precision recall curve, since