IRE Information Retrieval Experiment The methodology of information retrieval experiment chapter Stephen E. Robertson Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 30 The methodology of information retrieval experiment trivial problem, since one must use the results of an initial search without feedback before trying the feedback procedure. Further, in this case (unlike the last) there are no obvious solutions to be brought in from outside the field. There are in fact two methods in use at present: `residual ranking', which involves removing the documents obtained by the initial search from the collection (a different set for each query); and `half collection' experiments, where the initial search is done on one half of the collection and the feedback is applied to the other half. But in general, there has not been as much application of experimental design ideas in retrieval experiments as perhaps there should. This may be in part to do with the fact that so many of the variables of interest are difficult to control directly; but we might reasonably expect more such application in the future. The limitations of statistics Following this discussion of statistical ideas, two general points may be made. First, statistical problems are pervasive in retrieval tests; second, statistical and other considerations are closely intertwined. The process of drawing conclusions, of any sort, from the results of a test involves calling on various ideas, some of a statistical nature and some not; both sets of ideas are necessary, and they are not easily separable. Unfortunately, many of the basic statistical problems are difficult ones, not necessarily solvable in terms of textbook methods; indeed many of them have not yet been solved. So the extent to which any experimenter can use formal statistical methods when the situation demands is severely limited. Experimenters have been in the past, and will continue to be, forced to rely on ad hoc methods and statistical intuition. I hope, of course, that the necessary basic work will be done for new methods to be developed; but in the meantime, I hope that the above discussion will encourage an awareness of the nature of the problems, as an aid to intuition. 2.5 Conclusions There is no such thing as a watertight method for evaluating an information retrieval system. There is, on the other hand, a considerable battery of methods and techniques for dealing with the various problems that arise in this endeavour. Furthermore, each new test throws up new problems, or brings out inadequacies in traditional solutions. So the archetype I have described is a fluid concept, which will no doubt change as much in the next twenty years as it did in the last. If, in 2001, this entire chapter is obsolete, so much the better! Bibliographic notes Barring cross-references to other chapters, the text of this chapter has deliberately been left without references, in the interests of readability.