IRE Information Retrieval Experiment Retrieval system tests 1958-1978 chapter Karen Sparck Jones Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 220 Retrieval system tests 1958-1978 experiments. An important difference between the tests in the two groups was that on the whole the tests on manual indexing were evaluation tests of system performance, while more of the work on automatic indexing was concerned solely with demonstrating that automatic indexing was feasible and produced plausible output: this applies to the studies done by O'Connor, Borko, Stone and Rubinoff, to A. D. Little's NASA study, and indeed to most of the work reported in the two Stevens volumes. Within the two groups some tests can be described as experiments proper, involving some degree of control of variables and explicit comparison, ordinarily between different indexing languages. The Cranfield and CWRU studies fall into this category. Others, like Lancaster's Medlars test, were investigations. Some of the comparative tests, like those of Schuller or Cohen et al., as well as the investigations of Lancaster, were directly related to operational systems; others, including virtually all of the work on automatic indexing, were laboratory studies. The fact that in most of the work the emphasis was on the indexing language used gives the research of the decade a distinctive character. Indeed 1968 genuinely marks the end of one phase of research. For manual indexing, it could be called the Cranfield decade. The relatively uncontrolled comparisons of Cranfield 1 were followed by the more detailed tests of Cranfield 2. The experience gained in different Cranfield projects was, moreover, applied in, e.g. the Herner et al. study of the Bureau of Ships system, in the Medlars investigation, and the CWRU Comparative Systems Laboratory work. The CWRU Report of 196814 essentially constitutes an extended presentation of the testing methods developed in this whole context, and can be said to summarize the experience gained during the period. At the same time it was evident by 1968 that automatic indexing raised more problems than had been expected: the effort and difficulty involved in conducting well-organized and informative tests was clearly shown by Dennis' heroic experiments. Salton's book, Automatic Information Organisa- tion and Retrieval30 nevertheless marked the beginning of a new period since it emphasized the whole range of novel possibilities for information retrieval systems made available by computers, and the importance of viewing an automated information system as an integrated whole. Overall, the conclusion to be drawn from the work of the decade was expressed in the CWRU Report: the indexing language used is much less important in determining system performance than had been supposed. Given this general characterization of tests between 1958 and 1968, we can now consider the objective, form and result of the different projects in more detail. This will involve both the substantive and the methodological properties of the tests. In the discussion I shall treat the two groups of manual and automatic indexing tests separately since, as already noticed, they were very different in character. Index language tests We start with the indexing language tests, and first consider them substantively. As noted, these tests included the classical evaluation studies carried out at Cranfield and Western Reserve. The Cranfield i[OCRerr] and 22, 3