ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
Evaluation of Document Retrieval Systems
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
5-28
difference between these two over-all measures lies in the weighting
given to the relative position of the relevant documents in the ordered
retrieval list. The recall index (equation (5.20)) weights rank order
uniformly, and is therefore equally sensitive to the rank of every
relevant document. The precision index (equation (5.23)), however;
weights initial ranks more strongly, and is therefore more sensitive to
the system's behav[OCRerr]'[OCRerr] as reflected by the initial distribution of
retrieved documents.
The recall and precision indice..s derived here depend on the
assumption that the ordering induced on D by N is a full order, i.e.,
that it can be represented by a one-to-one mapping from D to the dense
set of integers from 1 to n(D). In general this may not be the case
since a partial order rather than a full order may rQsult from a given
retrieval operation; therefore a method for defining document rank in
this event is required.
The most natural way of treating documents which are equivalent
under a partial retrieval ordering is to give each member of the
equivalent set the average of the ranks which would apply to the set
members if they were differentiable. Hence, if. N induces the partial
order.: d1 >d2 >[OCRerr]d3[OCRerr]d4[OCRerr]d; >d6 <on a set D = [OCRerr]d1 ,d2,d3,d4,d5,d[OCRerr]
ranks are assigned in the sequence: 1,2,4,4,4,6.
In the derivation above of the normalized rank recall (eq.
(5.20)) and the normalized log precision (eq. (5.23)) it was assumed
that all members of the set of relevant documents D were of equal value.
H,
Consider now an extension of these indice's[OCRerr]by assuming that a partial
ordering on DH is' sp9cified[OCRerr]which reflects degree of relevance, i.e.,