Scientific Report No. IRS-13 Information Storage and Retrieval

IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Test Environment chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. i-34 The comparisons in columns 4 and 5 use the normalized evaluation measures, and in columns 3 and 9 the usual curves of fallout versus recall, and precision versus recall are used. The entries in the table show, for ex- ample, that for the ADI comparison the fallout versus recall curve produces a superior performance for the specific requests, while the precision versus recall curve shows the superiority to be with the general requests. Any slight crossing of the curves is ignored in this table. The entries in columns 2, 7 and 8, are explained by the example given in Fig. 16. Using the IRE-3 results the detailed performance re- sults of the specific and general requests are compared at five document cut-off levels. It can be seen that although the general requests re- trieve a greater number of relevant at each cut-off point compared with the specific requests, a lower recall ratio is achieved each time since with the general requests there are many more relevant to find. Also, the general requests are seen to achieve better fallout and precision ratios at each cut-off. Returning to columns 3 and 7 in Fig. 15, with only one exception the precision versus recall curve shows the general requests to be best, and the fallout versus recall curves all favor the specific requests. The exception to this, noted in the case of IRE-l, may be explained by[OCRerr]the fact that these 17 staff-prepared requests per- form very much better than any other set for any of the collections, and the useful length of these requests seems to offset the generality effect which favors the general requests in the set. This description reveals the difficulties involved in making this type of test comparison. As is suggested in Section II, part 7 or 8, user-oriented evaluation seems to be performed best by recognizing two