IRE Information Retrieval Experiment The methodology of information retrieval experiment chapter Stephen E. Robertson Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 12 The methodology of information retrieval experiment which may confuse the results-hence the idea of conducting experiments under laboratory conditions, with all variables controlled as far as possible. On the other hand, in order to answer questions which relate directly to real problems in the design of retrieval systems, and to provide answers which will apply in real situations, a test must be conducted in (as nearly as possible) an operational environment. The conflict between these two aims is a real and continuing one. As a result, a whole spectrum of testing methods has been developed, ranging from pure laboratory experiments to the study of real systems and users in their operating environment. This spread of methods is reflected in the various comments that follow on components of the `normal' retrieval test. 2.2 Components of the archetype The system That we need an information retrieval system in order to do a retrieval test is not quite as obvious as might at first appear. If we go back to the question(s) that gave rise to the test in the first place, they will very often revolve around some particular component of the system: the index language, say, or the indexing process, or the process of search formulation. Suppose, then, that we are concerned with the indexing process. Do we need to even have a searching process in order to do an experiment? Is there no way we could test the alternative indexing processes directly, without doing any searching? The short answer is: no, there is no satisfactory way. By and large it is not possible (at present) to set up criteria for the indexing process which we can be confident will relate to the overall performance of the information retrieval system in the right way. From which it follows that, if we want to decide between alternative indexing strategies for example, we must use these strategies as part of a complete information retrieval system, and examine its overall performance (with each of the alternatives) directly. This is a severe constraint on any test: it is as if, lacking the necessary theories of mechanics and the strength of materials, bridge designers had to build numbers of test bridges to test each material and each structural component which they might use in their designs. (The recent problems with box-girder bridges suggest that the idea is not so far-fetched!) The second, related problem area under this heading concerns the definition of the boundaries of the system, particularly in connection with the user. The usual view of the information retrieval system manager is that the (narrow) system is that which is under his control; the user normally falls outside this narrow definition. But there are two strong reasons for including at least some of the cognitive processes that the user goes through in the definition of the system for the purpose of the test: (1) Some of the processes that users go through can be influenced by narrow- system components such as the index language; (2) In terms of the arguments immediately preceding, we do not know how to relate narrow-system behaviour to wide-system performance. Both points relate to the idea discussed above, that the function of an information retrieval system is to help the user to satisfy his/her information