IRE Information Retrieval Experiment Retrieval effectiveness chapter Cornelis J. van Rijsbergen Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 3 Retrieval effectiveness Comelis J. van Rijsbergen 3.1 Introduction Information storage and retrieval Systems have been with us for many years now. Attempts to evaluate or measure their performance have been going on almost as long. This is not an entirely unrelated development since in designing and building any new system the question of its desirability, quality, value and benefit should arise naturally. In evaluating information storage and retrieval systems, those that deal with the retrieval of references to documents, much of the effort has gone into measuring variables based on the relevance of documents to the question put to the system. This aspect of evaluation is clearly only one part of the overall evaluation of any retrieval system. These relevance-based variables are chosen to reflect in some way what has now become known as the retrieval effectiveness: the ability of the system to retrieve relevant documents while at the same time suppressing the retrieval of non-relevant documents. The most well known pair of variables jointly measuring retrieval effectiveness are precision and recall, precision being the proportion of the retrieved documents that are relevant, and recall being the proportion of the relevant documents that have been retrieved. `Singly, each variable (or parameter as it is sometimes called) measures some aspect of retrieval effectiveness; jointly they measure retrieval effectiveness completely The measurement of precision and recall, or of any other similar pair of variables, is different in many respects from the measurement of variables in, say, the physical sciences. Each variable is based on the availability of data about the relevance of particular documents to a query. Although one can make a case for an objective notion of relevance, many researchers believe that relevance is entirely subjective, that is, given the same query but put by different users, different documents will be judged relevant. In this respect relevance behaves more in the way an observable behaves in quantum physics, since its measured value is not determined except in probability. The distribution of values associated with an observable will follow a certain probabilistic law determined by the state of the system. Unfortunately in information retrieval a similar probabilistic law for relevance does not exist. Hypotheses about the user population could be formulated to establish such a law, but its usefulness would be doubtful. 32