IRE
Information Retrieval Experiment
Retrieval effectiveness
chapter
Cornelis J. van Rijsbergen
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
3
Retrieval effectiveness
Comelis J. van Rijsbergen
3.1 Introduction
Information storage and retrieval Systems have been with us for many years
now. Attempts to evaluate or measure their performance have been going on
almost as long. This is not an entirely unrelated development since in
designing and building any new system the question of its desirability,
quality, value and benefit should arise naturally. In evaluating information
storage and retrieval systems, those that deal with the retrieval of references
to documents, much of the effort has gone into measuring variables based on
the relevance of documents to the question put to the system. This aspect of
evaluation is clearly only one part of the overall evaluation of any retrieval
system. These relevance-based variables are chosen to reflect in some way
what has now become known as the retrieval effectiveness: the ability of the
system to retrieve relevant documents while at the same time suppressing the
retrieval of non-relevant documents. The most well known pair of variables
jointly measuring retrieval effectiveness are precision and recall, precision
being the proportion of the retrieved documents that are relevant, and recall
being the proportion of the relevant documents that have been retrieved.
`Singly, each variable (or parameter as it is sometimes called) measures some
aspect of retrieval effectiveness; jointly they measure retrieval effectiveness
completely
The measurement of precision and recall, or of any other similar pair of
variables, is different in many respects from the measurement of variables in,
say, the physical sciences. Each variable is based on the availability of data
about the relevance of particular documents to a query. Although one can
make a case for an objective notion of relevance, many researchers believe
that relevance is entirely subjective, that is, given the same query but put by
different users, different documents will be judged relevant. In this respect
relevance behaves more in the way an observable behaves in quantum
physics, since its measured value is not determined except in probability.
The distribution of values associated with an observable will follow a certain
probabilistic law determined by the state of the system. Unfortunately in
information retrieval a similar probabilistic law for relevance does not exist.
Hypotheses about the user population could be formulated to establish such
a law, but its usefulness would be doubtful.
32