IRE Information Retrieval Experiment Introduction chapter Karen Sparck Jones Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Introduction 3 and addressed to similar topics, though with more emphasis on automatic natural language indexing techniques, and more interest in a wider range of system factors. Moreover, in other work motivated by the alternative approach, seeking to base experiments on theory and so explain rather than simply exhibit system behaviour, the influence of Cranfield is often visible. Thus while much experimental work is explicitly indebted to Cranfield 2, much more has been coloured by it. This book is rightly dedicated to Cyril Cleverdon. The Cranfield test showed what hard work experimental information retrieval is: collecting data, conducting searches, computing performance figures all take substantial time and effort. There is work in the essentials of experiment: maintaining systematic procedures, imposing controls on variables; and there is work in the realities of research: varying approaches in the search for understanding and explanation, and rehashing past studies for consistency and solidity. Salton (Chapter 15) has noted that the Smart Project has involved thousands of tests, implying a large use of man and machine power; yet there are important aspects of retrieval system use and behaviour which have not been studied in these tests. Since information is claimed to be of increasing importance, at least in twentieth century high technology cultures, while information systems are little understood, there is every justification for information system investigation and experiment. Existing information systems reflect long and extensive experience, but it does not follow that such systems cannot be improved, and that new technologies may not be exploited for wholly new types of system. It is arguable that our current understanding of information processing is like that of sixteenth century herbalists: it embodies some observation and insight, but lacks detailed analysis and supporting theory. Unfortunately, though many retrieval experiments and investigations have been carried out in the last 20 years, much of the experience in the conduct of tests which has been gained is not very accessible. Published papers tend to be cleaned up accounts of objectives, general methods, and results. Project reports may be much fuller, but even here it is often extremely difficult to find out exactly what was done, or why it was done. Though an improvement in the quality of experiments is detectable, far too many of those reported are defective, and in many cases defective in recognized ways, for which remedies are available. As all but the most limited test is a major enterprise, it is a pity that so much effort should be wasted. The best that can he said about many reported studies is that even if they are individually dubious they may collectively point in the same direction. This is perhaps something, but it is not much. The object of this volume is to make available the experience in information retrieval testing of its contributors, in the hope that this will lead to more fruitful and useful testing in the future. The book is designed to treat information retrieval experiment and investigation in a comprehensive way, relevant to both pure research and to operational practice. Pure research naturally leads to experiment, but system operators may also want to know how their system is working. Both research workers and practitioners may know what they want to find out, but it may not be at all obvious how it is to be found out, and in particular what the best specific testing method[OCRerr] are. This is clearly seen in evaluation tests. Most tcsts are intended to evaluate the performance of existing or proposed