IRE
Information Retrieval Experiment
Retrieval effectiveness
chapter
Cornelis J. van Rijsbergen
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
34 Retrieval effectiveness
that any given document (for the population of potential documents) that is
input to the system and relevant to question Q will be retrieved in response
to Q. Similarly precision is defined as the probability that a retrieved
document will be relevant. Notice that these definitions refer to individual
documents, so for these to make sense in terms of ratios I need to restate them
in terms of sets. If A is the set of relevant documents and B is the set of
retrieved documents then recall is P(B/A) (read, probability of B given A)
and precision is P(A/B), these can then be estimated simply by the traditional
ratios. Now let us look at the meaning of the probabilities more closely. Take
the probability of retrieval given relevance P(B/A), or more precisely the
probability of a document belonging to the retrieved set, given that it belongs
to the set of relevant documents. If A and B are two simple primitive
predicates, this conditional probability is easy to interpret, however B is not
simple. If we are to estimate the probability P(B/A) then we can only do this
reliably through a random process, but any retrieval strategy is not a random
process and thus technically cannot be used to estimate P(B/A).
One way of dealing with the definition of P(B/A) is to describe set B in
terms of a random variable. One assumes that attributes of documents can be
measured or ordered in some way in relation to a query, and that the values
of these attributes can be mapped into a single variable (like co-ordination
level or cosine correlation) having a well defined distribution. If one further
assumes that the distributions of this variable on the relevant and non-
relevant sets of documents are different, then one particular value of this
variable (the cut-off) can be used to discriminate between relevant and non-
relevant documents. Now the probabilities of a document belonging to B
conditional on A is well defined, it is the probability that, say, the matching
function for relevant documents exceeds some threshold. It is not difficult to
see how P(B/A[OCRerr]), fallout, can be defined in a similar manner.
Both P(B/A) and P(B/[OCRerr] can be defined as the expected value of some
other variable. When defined in this way we refer to them as expected recall
and expected fallout. For this we assume that each document has associated
with it a unique description x. The uniqueness is not necessary but it simplifies
the discussion. Robertson1 in his paper on the probability ranking principle
defined them as follows:
P(B/A) = [OCRerr] P(x/A)
XEB
P(B/A) = [OCRerr] P(x/A)
xeB
In words, if one can associate a probability with each document description
given that the underlying document is relevant (or non-relevant), then
expected recall (or expected fallout) is simply defined as the sum of the
probabilities over the retrieved set. If these probabilities were continuous,
then the sum would represent the area under the curve P(x/A) or P(x/A)
supported by the set B. This is of course how Swets2 originally defined these
probabilities.
A slightly different and perhaps more instructive way of defining expected
recall and expectedprecision (rather than fallout, although this can be defined
too) is in terms of the expected number of relevant documents in a set. For
this we need to define P(A/x), or, in words, the probability that a document