IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Evaluation Parameters
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
11-15
the curve for Abstracts Thesaurus is closer to the 1.0 precision at 1.0
recall corner over the whole of its range than the curve for Abstracts Stern
A minority of results do not show such complete agreement, and a
comparison presented in Figure 8 shows that only above 0.9 recall does the
curve merit agree with merit assigned by the normalized measures. Two indi-
vidual requests from the request set used are given in Figure 9, showing
that although in both requ[OCRerr][OCRerr]ts the normalized measures strongly favor the
??Cosin[OCRerr] Logical'1 option, some portions of the precision recall curve favor
T1Cosine Numeric'1. In request QAl2 the ranks of the last two relevant documents
favor cosine numeric, but the normalized measures are more directly influenced
by the larger rank changes at the top rank positions th&t favor cosing logical.
In[OCRerr]request 0A[OCRerr] the same effects cause the high precision end of the curve to
favor cosine numeric. Clearly single number measures cannot reflect crossing
performance curves, unless the measures are specifically designed to reflect
merit that exists at a particular point on the curve. But this possibility
is not met by the normalized measures, and it is not always correct to say
that normalized recall reflects merit at the high recall end of the curve,
and normalized precision does so at the high precision end. For example,
Figure 10 shows a result in which the average curve for "First Iteration11
is at all points better than 11lnitial Search??, yet normalized recall indicates
that the latter appears to be better. This occurs because the 11First Iteration11
result improves ranks of quite a few documents that were already quite highly
ranked in `1lnitial Search'1 (thus the normalized precision is best for t1First
Iteration1t), but at the same time, some other relevant documents that were
poorly ranked on "Initial Search"are worsened by quite large amounts in "First
Iteration'1, thus causing normalized reeall to drop, without affecting the
curve appreciably.