IRE Information Retrieval Experiment Retrieval effectiveness chapter Cornelis J. van Rijsbergen Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 42 Retrieval effectiveness as the set of points at which there is a change of recall, and secondly an interpolation function P(R) = {5Up P:R' >[OCRerr]R such that (R', P)EG} This is simply an algebraic expression for the interpolating step-function with the jumps occurring at (Re, Pe). One consequence of defining the function in this way is that certain points belonging to G may be `ignored'. To see this consider two neighbouring points (R1, P1) and (R2, P2) such that R2 > R1 but [OCRerr]2 <P1 (normally one would expect [OCRerr]2 <P1 because of the trade- off). Then in interpolating for a value R0 immediately preceding R1 the [OCRerr]1 value will not be used since P(R0) = P2. In an earlier publication I referred to (R1, P[OCRerr]) as an anomaly emphasizing that it could legitimately be ignored7. Other limits to retrieval effectiveness are due to the nature of the mathematical model used to represent any subsystem. Any model is always only an approximation to reality, and will therefore have inherent limitations associated with it. The most important set of models for which limits have been calculated are the probabilistic ones. These models require the estimation of certain probabilities from sample information. If one assumes that these probabilities can be calculated perfectly then in some sense the model cannot be improved. Therefore using perfect or complete information for the probabilities should lead to the best possible retrieval under the model. One must be very careful in thinking about the limitations imposed by certain models. For probabilistic models there are at least two levels of approximation. At one level the model is approximating the particular process or structure identified as determining retrieval effectiveness, at another level one is estimating (or approximating) the parameters of the particular model. A good example of this can be seen when constructing the function P(x/C) where C might be relevant or non-relevant. As a first step one decides on the structure of x and its associated distributions. This could be to assume that the components of x are independent, or partially or fully dependent. This implies a series of models one of which may be a better approximation than any of the others. Once the particular model has been settled one attempts to estimate its parameters. Now it is impossible to establish which is the correct model; one can only demonstrate by experiment that one model will lead to better performance than some other model, and assume that because our experiment is random any future experiment will show the same result. Although in the above discussion I have blithely talked about perfect or complete information for estimating parameters, in reality we never have this. Our information fails to be perfect in at least two important ways. First, our documents are only a sample of a population of potential documents, and so even though we use all the information in the collection available to calculate our so-called perfect estimates, they are not the population parameters. Secondly, in most collections for which relevance judgements have been made, they are rarely 100 per cent exhaustive although they may be 80 per cent exhaustive, and so our perfect information falls short by 20 per cent. Upper bounds to retrieval effectiveness can be used in different ways, (1) as a general experimental yardstick for performance, (2) to compare the 4- I I t I I I I Ii I