IRE Information Retrieval Experiment Retrieval effectiveness chapter Cornelis J. van Rijsbergen Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 34 Retrieval effectiveness that any given document (for the population of potential documents) that is input to the system and relevant to question Q will be retrieved in response to Q. Similarly precision is defined as the probability that a retrieved document will be relevant. Notice that these definitions refer to individual documents, so for these to make sense in terms of ratios I need to restate them in terms of sets. If A is the set of relevant documents and B is the set of retrieved documents then recall is P(B/A) (read, probability of B given A) and precision is P(A/B), these can then be estimated simply by the traditional ratios. Now let us look at the meaning of the probabilities more closely. Take the probability of retrieval given relevance P(B/A), or more precisely the probability of a document belonging to the retrieved set, given that it belongs to the set of relevant documents. If A and B are two simple primitive predicates, this conditional probability is easy to interpret, however B is not simple. If we are to estimate the probability P(B/A) then we can only do this reliably through a random process, but any retrieval strategy is not a random process and thus technically cannot be used to estimate P(B/A). One way of dealing with the definition of P(B/A) is to describe set B in terms of a random variable. One assumes that attributes of documents can be measured or ordered in some way in relation to a query, and that the values of these attributes can be mapped into a single variable (like co-ordination level or cosine correlation) having a well defined distribution. If one further assumes that the distributions of this variable on the relevant and non- relevant sets of documents are different, then one particular value of this variable (the cut-off) can be used to discriminate between relevant and non- relevant documents. Now the probabilities of a document belonging to B conditional on A is well defined, it is the probability that, say, the matching function for relevant documents exceeds some threshold. It is not difficult to see how P(B/A[OCRerr]), fallout, can be defined in a similar manner. Both P(B/A) and P(B/[OCRerr] can be defined as the expected value of some other variable. When defined in this way we refer to them as expected recall and expected fallout. For this we assume that each document has associated with it a unique description x. The uniqueness is not necessary but it simplifies the discussion. Robertson1 in his paper on the probability ranking principle defined them as follows: P(B/A) = [OCRerr] P(x/A) XEB P(B/A) = [OCRerr] P(x/A) xeB In words, if one can associate a probability with each document description given that the underlying document is relevant (or non-relevant), then expected recall (or expected fallout) is simply defined as the sum of the probabilities over the retrieved set. If these probabilities were continuous, then the sum would represent the area under the curve P(x/A) or P(x/A) supported by the set B. This is of course how Swets2 originally defined these probabilities. A slightly different and perhaps more instructive way of defining expected recall and expectedprecision (rather than fallout, although this can be defined too) is in terms of the expected number of relevant documents in a set. For this we need to define P(A/x), or, in words, the probability that a document