IRE Information Retrieval Experiment The methodology of information retrieval experiment chapter Stephen E. Robertson Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Introduction 11 against some standard, or in terms of some criteria of success or failure, and then look for possible ways of approaching the standard or reducing the failures. Such a test would be more on the lines of an investigation as defined in the editor's introduction. Moving into the sphere of basic research, we may be concerned with the general principles of information retrieval system design. Again we have the distinction between experiments designed to help choose between alternative general principles, and investigations aimed at the discovery of new principles. Another objective for investigation might be to test the feasibility of some particular design principle: that is, to test whether a system can be designed on the basis of such a principle. All these are good reasons for wanting to test information retrieval systems. They are also all different, and impose different requirements and constraints on the conduct of the test. So even before we get down to the pragmatic level of how best to do things in different circumstances, we find that the question `How should we test an information retrieval system?' has many answers. Readers are invited to bear this in mind for the rest of the book! The archetypal retrieval test At this point, it is worth describing the general form of a retrieval test, as it has evolved over the last 20 years. This is not to say that this form is correct, or that any of the many variants of it are peculiar in any respect; it is merely to establish a reference point on which to base further discussion. What constitutes a test of a retrieval system or systems? First of all we have to have the system itself: that is, the set of rules and procedures, and human or mechanical operators of these rules and procedures. Next, we must have the raw material on which the system works: the documents and requests. Tests in general, and experiments (in the sense defined) in particular, are normally intended to answer specific questions. An important component of any test is the experimental design: that is, the way in which the test is organized in order to answer the appropriate questions. All tests involve some kind of measurement, in the widest sense of the word. In most information retrieval system tests, this includes (among other things) some form of assessment of the system's response to each query. Generally, a document collection will contain documents that might have been useful in the context of a particular query, but which the system does not retrieve. Many experiments include attempts to discover some or all of these documents, with a view to assessing the performance of the system against some standard. Finally, we must have methods of analysing the results, in such a way as to allow us to draw the appropriate conclusions, to answer the questions with which we set out. All these aspects are discussed further below. Operational versus laboratory tests In order to answer a specific question or questions unambiguously, a test must be designed as far as possible to exclude any extraneous variations