Scientific Report No. IRS-13 Information Storage and Retrieval

IRS13 Scientific Report No. IRS-13 Information Storage and Retrieval Evaluation Parameters chapter E. M. Keen Harvard University Gerard Salton Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 11-15 the curve for Abstracts Thesaurus is closer to the 1.0 precision at 1.0 recall corner over the whole of its range than the curve for Abstracts Stern A minority of results do not show such complete agreement, and a comparison presented in Figure 8 shows that only above 0.9 recall does the curve merit agree with merit assigned by the normalized measures. Two indi- vidual requests from the request set used are given in Figure 9, showing that although in both requ[OCRerr][OCRerr]ts the normalized measures strongly favor the ??Cosin[OCRerr] Logical'1 option, some portions of the precision recall curve favor T1Cosine Numeric'1. In request QAl2 the ranks of the last two relevant documents favor cosine numeric, but the normalized measures are more directly influenced by the larger rank changes at the top rank positions th&t favor cosing logical. In[OCRerr]request 0A[OCRerr] the same effects cause the high precision end of the curve to favor cosine numeric. Clearly single number measures cannot reflect crossing performance curves, unless the measures are specifically designed to reflect merit that exists at a particular point on the curve. But this possibility is not met by the normalized measures, and it is not always correct to say that normalized recall reflects merit at the high recall end of the curve, and normalized precision does so at the high precision end. For example, Figure 10 shows a result in which the average curve for "First Iteration11 is at all points better than 11lnitial Search??, yet normalized recall indicates that the latter appears to be better. This occurs because the 11First Iteration11 result improves ranks of quite a few documents that were already quite highly ranked in `1lnitial Search'1 (thus the normalized precision is best for t1First Iteration1t), but at the same time, some other relevant documents that were poorly ranked on "Initial Search"are worsened by quite large amounts in "First Iteration'1, thus causing normalized reeall to drop, without affecting the curve appreciably.