IRS13
Scientific Report No. IRS-13 Information Storage and Retrieval
Evaluation Parameters
chapter
E. M. Keen
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
11-44
their view the usual precision and recall can only be used in situations
where relevance decisions are bI[OCRerr]ck or white. An Example of a performance
characteristic curve using relevance grades is given in Figure 26(a). The
Cran-l collection is used because grades of relevance on a scale of four are
available for these relevance decisions; thus a "point score11 is assigned to
those requests, giving a score of four to the most relevant documents, three to
the next, and two and one to the final two grades. Figure 26(a) then uses
these cumulated relevance points on the y axis as indicating a type of recall,
and uses rank positions (cut-off ratio) on the x axis. Two dictionaries are
compared, and the best possible performance curve is displayed.
However, as has been demonstrated in [2], it is not correct to assume
that precision and recall are incapable of handling relevance grades. Figure 26(b)
uses the same data and displays two precision recall graphs, where recall is
based on the relevance points score rather than on the more usual document score.
In fact, the merit of the two options compared is quite identical - and must be
so mathematically so that the curves cross at the same point; furthermore, the
rank position value can be indicated on the precision recall graph as shown.
The performance characteristic curve does not give any directly visible infor-
mation about the amount of non-relevant material being retrieved; the conclusion
is then that precision is of value here. Additional precision recall graphs
based on relevance grades are given in Section I of this report.
It is also a quite simple matter to modify the single number measures
to incorporate grades of relevance. For example, using the normalized recall
measure, a "Weighted Normalized Recall" may be defined:
Weighted Normalized Recall = 1 -
n
11 ____
i=l
n(N-n)