IRE Information Retrieval Experiment Retrieval system tests 1958-1978 chapter Karen Sparck Jones Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 1' 218 Retrieval system tests 1958-1978 inevitable. In particular, it is virtually impossible to apply any rigorous definition of a `unit' test or `unit' experiment as, say, an explicit comparison between two values of a primary variable, with all other variables held constant, or perhaps a comparison between two values of a primary variable for two of some secondary variable. This is to some extent because definitions would lead to intolerable detail, but also because much reported work is rather difficult to characterize consistently at this level: this in turn is partly because, as noted above, retrieval system behaviour is not well characterized in terms of its components. Some large or continuing projects can indeed be described as conducting series of tests. But in general, an individual test will be taken, informally, as whatever the authors of a paper regard as a test, which is chiefly a matter of objectives. This has the advantage of matching the authors' own views of tests in terms of their primary variables, but the disadvantage of failing to take full account of the information embodied in multi-variable tests. That is, where authors are interested in the behaviour of a primary variable subject to the variation of one or more secondary variables, we may turn the test upside down and view the secondary variables as primary. However attempting to examine the mass of tests done from all points of view would be impossible, so, though some alternative views will be noted, these will be rather limited, and will be mainly those recognized by the research workers responsible for large, multi-variable tests. 12.3 The decade 1958-1968 The year 1958 is a natural starting point for the historical account. The 1958 Washington International Conference on Scientific Information was widely felt to mark new developments in documentation and information retrieval, specifically the appearance of a new intellectual tool, post-coordination, and a new physical tool, the computer. Luhn's auto-abstracts of conference papers may be taken as a symbol of the possibilities then perceived for automatic information processing. Research work in the following decade, and especially in the earlier part of the 1 960s, was dominated by studies comparing newer post-coordinate indexing, perhaps involving a thesaurus, with older classificatory approaches. The expansion of computing was associated on the one hand with research on fully automatic indexing and searching systems, and on the other with work on automated searching. As had already been demonstrated by the use of punched card machines, post- coordination was especially suited to automation, and formed the basis of studies of automatic indexing and searching. Research on statistically-based indexing, stimulated by Luhn, was especially prominent in the early 1960s. It was soon recognized that identifying indexing keys by direct automatic content analysis was not a realistic shorter-term aim, and statistical techniques for extracting information about words and word relations were proposed as substitutes. There was considerable enthusiasm for automation, and optimism about its potentialities, reflected in the effort devoted to machine translation. The hardware and software limitations of the machines available nevertheless made research into all kinds of automatic information processing methods very difficult. Post-coordination and automation were essentially responses to the