IRE
Information Retrieval Experiment
Retrieval effectiveness
chapter
Cornelis J. van Rijsbergen
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
42 Retrieval effectiveness
as the set of points at which there is a change of recall, and secondly an
interpolation function
P(R) = {5Up P:R' >[OCRerr]R such that (R', P)EG}
This is simply an algebraic expression for the interpolating step-function
with the jumps occurring at (Re, Pe). One consequence of defining the
function in this way is that certain points belonging to G may be `ignored'. To
see this consider two neighbouring points (R1, P1) and (R2, P2) such that
R2 > R1 but [OCRerr]2 <P1 (normally one would expect [OCRerr]2 <P1 because of the trade-
off). Then in interpolating for a value R0 immediately preceding R1 the [OCRerr]1
value will not be used since P(R0) = P2. In an earlier publication I referred to
(R1, P[OCRerr]) as an anomaly emphasizing that it could legitimately be ignored7.
Other limits to retrieval effectiveness are due to the nature of the
mathematical model used to represent any subsystem. Any model is always
only an approximation to reality, and will therefore have inherent limitations
associated with it. The most important set of models for which limits have
been calculated are the probabilistic ones. These models require the
estimation of certain probabilities from sample information. If one assumes
that these probabilities can be calculated perfectly then in some sense the
model cannot be improved. Therefore using perfect or complete information
for the probabilities should lead to the best possible retrieval under the
model.
One must be very careful in thinking about the limitations imposed by
certain models. For probabilistic models there are at least two levels of
approximation. At one level the model is approximating the particular
process or structure identified as determining retrieval effectiveness, at
another level one is estimating (or approximating) the parameters of the
particular model. A good example of this can be seen when constructing the
function P(x/C) where C might be relevant or non-relevant. As a first step
one decides on the structure of x and its associated distributions. This could
be to assume that the components of x are independent, or partially or fully
dependent. This implies a series of models one of which may be a better
approximation than any of the others. Once the particular model has been
settled one attempts to estimate its parameters. Now it is impossible to
establish which is the correct model; one can only demonstrate by experiment
that one model will lead to better performance than some other model, and
assume that because our experiment is random any future experiment will
show the same result.
Although in the above discussion I have blithely talked about perfect or
complete information for estimating parameters, in reality we never have
this. Our information fails to be perfect in at least two important ways. First,
our documents are only a sample of a population of potential documents, and
so even though we use all the information in the collection available to
calculate our so-called perfect estimates, they are not the population
parameters. Secondly, in most collections for which relevance judgements
have been made, they are rarely 100 per cent exhaustive although they may
be 80 per cent exhaustive, and so our perfect information falls short by 20 per
cent.
Upper bounds to retrieval effectiveness can be used in different ways, (1)
as a general experimental yardstick for performance, (2) to compare the
4-
I
I
t
I
I
I
I
Ii
I