IRE Information Retrieval Experiment Testing in general part Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Part 1 Testing in general The four chapters of Part 1 are concerned with the general questions involved in designing and conducting retrieval tests, whether experiments or investigations, in operational or laboratory environments. Retrieval systems are so complicated, and so little understood, that it is easy to do poor tests and, more particularly, poor experiments. These chapters are intended, through their discussion of the theoretical and practical issues involved, both to bring out the problems of conducting information retrieval tests and to show how these problems may be tackled. The chapters provide an account of our present understanding of retrieval systems and system testing which can be used on the one hand as the basis for an assessment of past tests and on the other, more importantly, as the basis for the design of future tests. For any retrieval test decisions have to be taken about the variables to be studied, and about how they are to be studied. These decisions rest on assumptions about the character and purpose of retrieval systems, and issue in test designs covering the choice of test data, control of test variables, and representation of test results. In the first chapter, Robertson sets the scene by discussing the essential nature of the relation between a retrieval system, the object of study, and a test of this system, the study itself. His chapter is thus focused on the general methodology of retrieval system testing. The two cruxes of retrieval system testing are then considered more fully by van Rusbergen and Belkin. Since retrieval systems have a function, system testing depends on some view of how system performance is to be evaluated, and van Rusbergen examines the issues involved in characterizing and measuring system effectiveness. However, since retrieval systems deal ultimately with human needs for, and reactions to, information, system testing cannot begin without an interpretation of these underlying concepts of information need and satisfaction. The many problems this presents are the theme of Belkin's chapter. The final chapter in the section, by Tague, spells out the implications of the general points made in the preceding chapters in terms of the way that specific practical decisions have to be made at every stage in setting up, carrying out, and drawing conclusions from, a retrieval system test. Her chapter makes the connection between system and [OCRerr]est at the detailed leyel, supported by, and supporting, the more general statements of the previous chapters. 7