IRE Information Retrieval Experiment Retrieval system tests 1958-1978 chapter Karen Sparck Jones Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 216 Retrieval system tests l958[OCRerr]1978 coloured threads to be traced, some of them even standing out as brightly coloured against the overall grey brown. In my view the questions experiments seek to answer should be viewed as bearing on the quality of experiments. Thus we can evaluate experiments in terms of method, hypothesis, and research or application status, and also information retrieval system concern: some aspects of document retrieval systems are more central and important than others, for example searching and matching as opposed to the quality of abstracts used as a basis for indexing, or the convenience of the online searcher's terminal. The core of an information retrieval system is the document access information, i.e. the character of the indexing data and search mechanisms available. The character of the users, of the literature, of the physical and administrative plant, and so on, represent progressively more peripheral environments of the indexing and search functions. We may therefore, other things being equal, rate studies concerned with the core of an information retrieval system as more important than those directed at the periphery. The influence of the experiments which have been carried out can, on the other hand, be dealt with by an historical account. A chronicle version of retrieval experiment does not match the logical characterization just described particularly well, so an historical account of testing is required to balance an evaluative one. The choice and sequence of experiments has naturally been influenced by the challenges posed by the findings of specific tests, but it has also been affected by developments in operational systems and in broader changes in attitudes to information system provision. The remainder of the chapter will therefore be organized as follows. I shall first provide a summary view of the history of information retrieval experiment in its wider context, mentioning noteworthy tests in passing. I shall then consider these and other representative experiments from an evaluative point of view, in relation to their objectives, i.e. their focus1 motivation, and underlying assumptions; in relation to their forms, i.e. broadly speaking their data and conduct, which can be itemized utilizing Bourne's useful scheme5 as covering (1) corpus size (requests and documents) and subject, (2) source of the requests, (3) degree of request negotiation with the user, (4) number of relevance levels (excluding non-relevance), (5) status of the relevance judges and basis of their judgements, (6) performance measures; and in relation to their results, i.e. their findings, the interpretation given to these findings, and their implications. This evaluation will be primarily retrospective, but some reference to what the experiments looked like at the time may be appropriate. Overall this survey will seek to show whether and how experiments have changed in their objective or type of objective, their form, and their results, and more particularly if any changes reflect a growth of experience in the conduct of information retrieval tests and in the understanding of retrieval systems. Following the detailed discussion I shall summarize the main features of the test work done, viewed as a whole. It turns out that the research of the period covered by the chapter can he naturally divided into that of the decade 1958-1968, and that of the decade