IRE Information Retrieval Experiment Retrieval system tests 1958-1978 chapter Karen Sparck Jones Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. The decade 1968-1978 233 from the point of view of test evaluation usefully, organized by the five specific topics. The work of 1968-1978 is more variegated than that of the previous decade, and it is less easy to describe in a tidy way. Concrete comparisons between tests are more difficult to make, and comprehensive generalizations about groups of tests cannot always be provided. In some groups no tests stand out as especially important; however for the decade as a whole we can single out the tests or sets of tests done by Aitchison et al.49, Keen51' 52, Vaswani and Cameron78, Miller68-70, UKCIS53' 54, 71-73, and perhaps Sparck Jones75-77' 79, 80, 93-95 as significant in terms of scope, conduct or result. Some tests, like Aitchison et al.'s and Keen's, resembled Cranfield 2 in touching on a wide range of questions. The Smart Project work as a whole is very important4' 63-66, 96-98 Turning now to individual tests, or more particularly experiments, both important and representative, the question is what changes and developments are detectable in their objectives, forms, and results. As in the discussion of the work of the decade 1958-1968, the tests will be considered first from the substantive, and then from the methodological points of view; but in this case all five groups will be treated substantively before methodological questions are considered. Index language tests The tests in the first group were focused on comparisons between different indexing languages. This group is exemplified by Jahoda and Stursa's test99, Cleverdon's three tests87' 100, 101, and those of Aitchison et al.49, Barker et al.53' [OCRerr] Olive et al.50, and Keen51' 52, Jahoda and Stursa compared single subject access with a KWIC index, Cleverdon controlled thesaurus-type languages with natural language, Keen's ISILT several controlled languages and natural language, Aitchison et al. and Barker et al. chiefly different natural language texts like titles and abstracts. Smart Project experiments included manual controlled versus automatic natural language comparisons in the Medlars tests4, and Miller, in working on searching, tested controlled MeSH versus natural language68' 69; Evans compared manually and automatically assigned thesaurus terms86, and Klingbiel and Rinker manual and semi-automatic natural language indexing85. Keen's printed subject investigation, EPSILON, can also be regarded, though the emphasis is on searching, as partly a language test60. The most conspicuous feature of these tests is the inclusion of natural language; index language tests in the previous decade were typically confined to different forms of controlled language. The inclusion of natural language, represented either by manually-selected keywords or by automatically- searchable titles or abstracts, must be seen as responding in part to the findings of earlier projects like Cranfield 2 (this was indeed explicitly the case in, for example, Aitchison et al.'s test), and in part to the increasing use of machine files for which title searching in particular is especially appropriate. The cost of using a controlled language with very large files, whether for indexing or searching, must be a contributing factor too. Some of the tests, like Cleverdon's DOAE test100, and Keen's, explicitly covered dependent variables like indexing exhaustivity, and Aitchison et al. included question 1.