Information Retrieval Experiment

IRE Information Retrieval Experiment Retrieval effectiveness chapter Cornelis J. van Rijsbergen Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Optimal retrieval 35 I[OCRerr] relevant given a particular description. This Probability can be derived Irt[OCRerr]m P(x/A) through Bayes' Theorem (see van Rusbergen3, p. 115). The ("Ipecled number of relevant documents can now be simply defined as P(A/x) [OCRerr]efl l'[OCRerr][OCRerr])in this we get: I .xpected Recall P(A/x) [OCRerr] Al I `xpccted Precision P(A/x) XEB lB! I )(.lilling P(B/A) and P(A/B) in terms of expected recall and expected I[OCRerr]icL.ision has many advantages, for one, it does not raise the same problems (II I1)terpreting the probabilities. Another major advantage is that the trade- II between expected recall and precision can be shown to hold almost iii[OCRerr]iiiediately for certain forms of retrieval4. Wc return now to the problem of constructing a measure of retrieval `[OCRerr]fIcctiveness. In some ways this is a secondary problem, particularly if one `(ililitS that any such measure will be a function of at least two variables such I)rccision and recall. Nevertheless it may well be possible to construct a .([OCRerr]II[OCRerr]jble measure of retrieval effectiveness independent of the traditional I).trlmeters, but this would only be worth doing if it simultaneously led to a Ii II[OCRerr]rent theory of information retrieval. One of the present major advantages [neasuring retrieval effectiveness in terms of recall and precision (or i.Ill()Ut) is that we are able to state categorically that if retrieval is done in a (rtdin way it will be optimal in terms of its effectiveness measured by recall .111(1 precision. It is conceivable that optimality in terms of precision and icc[OCRerr]ll does not result in optimality with respect to some measure of [OCRerr]Il[OCRerr]ctiveness, although to date most sensible composite measures are ([OCRerr]J)timized as well. `.3 Optimal retrieval one of the more interesting things that has happened in information retrieval c[OCRerr]carch in recent years is that theoretical work on evaluation and on retrieval [OCRerr]Irategies have fitted together. Of course much earlier Swets tried to do this md his work did cause a flurry of papers, but their impact on further Illeoretical and experimental work was not felt until much later. Probably the single most important result in which the definition of ictrieval effectiveness and retrieval strategy interact is the probability ranking I)rinciple. This principle emerged in the work of Robertson and Sparck lones5, and Cooper6. It reads as follows: `If a reference retrieval system's response to each request is a ranking of the documents in the collection in order of decreasing probability of relevance to the request as submitted by the user, where the probabilities are estimated as accurately as possible on the basis of content derivable