IRE
Information Retrieval Experiment
Retrieval effectiveness
chapter
Cornelis J. van Rijsbergen
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Theoretical foundations 33
I[OCRerr]tiis analogy between relevance and an observable in quantum mechanics
(oM) should not be taken too seriously, its use is mainly to highlight the
tujierent uncertainty associated with relevance. One interesting aspect of the
*iii.iJ()gy is, however, that a further similarity between the formalism for
Iliformation retrieval and QM becomes apparent when one considers the
[OCRerr]%`[OCRerr][OCRerr]II established trade-off between precision and recall. This is similar to the
I I&[OCRerr]isenberg uncertainty principle in physics, where for example momentum
.Ili(i position cannot be measured simultaneously to any desired level of
iccuracy: increasing the accuracy for one leads to a necessary decrease in
Iccuracy for the other. Similarly in attempting to increase precision we
uways find a decrease in recall. In fact under some mathematical models in
lii formation retrieval the trade-off is a necessary one, and not simply observed
t~inpirically.
It must be emphasized that measuring retrieval effectiveness is a form of
tierived, as opposed to fundamental, measurement. The fundamental
<lantity' involved is relevance; once this has been established we can
it tempt to measure retrieval effectiveness. Because of the difficulties involved
iii establishing a theory of relevance, relevance-based measures have typically
f)een used in an experimental (artificial as opposed to operational) context,
that is, in one where the relevance of a document has been decided in
.`t(ivance. Given such an experimental set-up, it would appear that retrieval
[OCRerr]tr(1tegies can be evaluated for their effectiveness in terms of, say, precision
md recall without any difficulty. Unfortunately life is not that simple, and
(jitficulties arise with the form of measurement at different levels. A few are
is follows
(I) Sampling level?
(2) One or more variables?
(.3) How to normalize for relevance feedback information?
(4) Effect of ranking on form of measurement?
(5) Effect of interpolation, interpretation?
(6) Effect of averaging technique?
I:.ach of these technical problems, except the first which has already been
covered by Robertson, will be touched upon in the rest of this chapter.
3.2 Theoretical foundations
*I[OCRerr]he problems of measurement in information retrieval differ from those
encountered in the physical sciences in one important aspect. In the physical
sciences there is usually an empirical ordering of the quantities we wish to
measure. For example, we can establish empirically by means of a scale
which masses are equal, and which are greater or less than others. Such a
situation does not hold in information retrieval. There is no empirical
ordering for retrieval effectiveness and therefore any measure of retrieval
effectiveness will by necessity be artificial.
The basic variables underlying any measure of retrieval effectiveness are
usually precision and recall, or some other equivalent pair. The conventional
way to define these is in terms of ratios; however, more recently it has proved
f'ruit2ul to define them as probabilities. Recall is defined as the probability