ISR10
Scientific Report No. ISR-10 Information Storage and Retrieval
Evaluation of Document Retrieval Systems
chapter
Joseph John Rocchio
Harvard University
Gerard Salton
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
5-15
subsets as the degree of association (defined by the matching relation)
is decreased. [OCRerr]ach of these techniques is commonly used to represent
the variation in the joint distribution of the user/system decisions as
the quantity of output (the size of the retrieved subset) is increased
(or equivalently as the matching criterion is relaxed). in section 4
of this chapter an alternative to the general evaluation strategy of
describing performance by a set of parameters which vary with discrete
changes in the matching criterion is presented.
D. [OCRerr]he Precision-Recall [OCRerr]radeoff
[OCRerr]he use of a precision vs. recall plot variable with the
cutoff parameter as an evaluation tool for document retrieval systems
6
(introduced by Cleverdon ) has led to observations that there exists a
so-called tradeoff between *these two conditional probabilities which
is of fundamental significance. It will be shown here, however, that
this inverse relationship is a direct consequence of assuming a
statistically significant matching function, and further that both
of these conditional probabilities are increased by any process which
im[OCRerr]roves the joint probability of retrieval and relevance.
[OCRerr]he increase in recall as the amount 9f output accepted as
retrieved is increased i[OCRerr] a direct consequence of the definition of
the recall conditional probability. Since the retrieved subset is
monotonically increasing, the ratio of relevant documents retrieved to
tdtal number of relevant documents (a constant for any retrieval
operatidn) is necessarily monotonically increasing. Precision,'
however, is defined as the ratio of relevant documents retrieved to