IRE Information Retrieval Experiment Laboratory tests: automatic systems chapter Robert N. Oddy Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Laboratory experiment in information retrieval 157 Information retrieval is the instrumentalist wing of information science. Most laboratory work has been explicitly directed toward the establishment [OCRerr])I' system design principles. Where information retrieval tests have been effected automatically, the theories under test have almost always been prescriptive. This should not be taken for a truism: if a theory is tested by means of a computer program, or other machine, it does not follow that it is %`imply prescriptive. It is not in practice always true, and certainly not necessarily true that a program, constructed as laboratory apparatus, is in %.`ome sense a prototype for a real life system. It is quite possible for a program to act, primarily, as a formalism, or detailed interpretation of, say, a (lescriptive theory. In fact, in the artificial intelligence field, this is frequently the intention of the programmer1. However, within the mainstream of research on automated information retrieval, it happens to be the case that theories have been predominantly prescriptive, and laboratory systems have been put up as potential prototypes. Perhaps it would be realistic to view these computer test environments rather as engineering workshops than as laboratories. Topics that have been investigated in computer laboratories include classification of index terms2, document clustering35, automatic indexing and term weighting6' 7, relevance feedback8-10, vector space models8' [OCRerr] and probabilistic theories12' 13 The usual way of testing the ideas has been to evaluate the ability of retrieval programs based upon them to separate relevant from non-relevant documents. Many aspects of the experimental methodologies used in this type of work derive from those developed by Cleverdon, particularly in the second Cranfield project14 The research methodology which has dominated laboratory work on automated information retrieval can be summarized by Figure 9.1. (There are several obvious feedback loops which I have omitted from the picture.) Empirical knowledge (about indexing languages, for instance), combined with the researcher's own intuition, lead him to state some assumptions about the inputs to, and objectives of an information retrieval system. From these he will attempt to derive a system specification perhaps by means of a structure of mathematical deductions. Thus, he can build computer programs which create and organize collections of document descriptions, retrieve references in response to compatibly formulated queries, and monitor their own activities for evaluation purposes. The evaluation uses test data, that is documents and queries chosen by the experimenter and with known characteristics; and it is normally the performance of the system with respect to the objectives that is evaluated, and not the plausibility of the assumptions or theory directly. Over the years the amount of rigour and effort allotted to the various components of the methodology have fluctuated. For example, as experience has been gained with certain classes of retrieval test system, programs have been assembled into flexible packages, so that new mechanisms can more easily be built from the components of old. Thus, the effort required in implementing programs has declined. At the same time, there is a new wave of mathematics in information retrieval research15' 16 The assumptions are stated more rigorously than before, and the theoretical development prior to system construction has become a focus of attention5 13, 17, 18 Another discernible variation in methodology, with time, is that the