IRE Information Retrieval Experiment Retrieval effectiveness chapter Cornelis J. van Rijsbergen Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Lin[OCRerr]it[OCRerr] to retrievd'l 4[OCRerr] rilanner of representation that leads to the limitation on effectiveness. In one [OCRerr]cnse this is true. However, consider the situation where we could characterize md discriminate our documents perfectly, and let us assume that the system is very large and that queries can be arbitrarily complicated. Of course I do riot believe that for an almost infinite collection we can represent documents perfectly with anything less than the complete semantic content (disregarding I rguments as to how this might be done). To achieve perfect retrieval in such I case, one would need to know already what one was looking for, and hence the retrieval system would be irrelevant. The point of a retrieval system is to tell you what you do not already know in response to an incomplete statement of your information required. The computational process of locating likely relevant documents is not unlike the mechanical process of proving theorems within a formal system. If we assume, not unreasonably, that a computational model for retrieval is in power equivalent to arithmetic, then G6del's theorem by analogy also Icads one to suspect that certain statements will be undecidable; or to put it more precisely, there will be documents whose relevance or non-relevance is `undecidable' within the formal system for retrieval, i.e. documents which the system cannot guarantee to find. What I am arguing is that there may be an essential incompleteness about retrieval systems, which although not apparent in current implementations, may well become apparent with increased sophistication of our systems. At the other end of the scale we have performance trivially limited by 0 per cent precision and 0 per cent recall. A less superficial bound is the one set by random retrieval, which is represented most naturally by recall fallout. Any retrieval strategy worth its salt will do better than random. The limit set by random retrieval can be discussed at two levels: a global and a local level. At the global level it simply indicates that we can expect to do better overall by some retrieval strategy than retrieving randomly from the collection. At a local level a limit can be used in two ways. First, to set a natural cut-off when ranking with respect to the probability of relevance; at some point down the ranking the estimated P(relevance/x) will equal the prior probability P(relevance) which is the point at which we are retrieving randomly, and beyond which point the strategy becomes useless. Of course this `random' cut-off point will almost always exceed any cut-off set by the user. But since in most experiments we are dealing with simulated users it is as well to rank to this conservative cut-off point. Secondly, we can use random retrieval at a local level to motivate a particular way of interpolating between precision-recall points. It was pointed out earlier that in the predictive method of evaluation we needed to interpolate. One possible method is by step-function, the jumps occurring at actual changes in recall. One way of motivating, and to some extent justifying, this method is to argue in reverse, and say that given any recall-precision point which has been calculated for the retrieved set then it is always possible to retrieve randomly from this retrieved set. Therefore it is always possible in this way to generate points at the same precision value increasing in recall up to the recall of the given point. Thereby we can interpolate points between any two given points by defining firstly: G = {(R0, P[OCRerr])Y