IRE Information Retrieval Experiment The Cranfield tests chapter Karen Sparck Jones Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Cranfield 1 257 tests in practice limited performance measurement, be interpreted as referring primarily to effectiveness. 13.1 Cranfield 1 The ancestor of Cranfield 1 was a pilot experiment in the use of Uniterms for indexing aeronautical documents which was carried out at Cranfield in 1953. The data for the test consisted of 40 questions based on source documents which were searched on 200 documents; performance was measured as the success rate in retrieving the query sources, which was 83 per cent (strictly 82.5 per cent). The test formed part of a group described by Thorne1 in 1955 involving a comparison between Uniterms and the UDC as used at the Royal Aircraft Establishment, and a subsequent comparison between the NLL specialized aeronautical indexing language developed at National Aeronautical Research Institute, Amsterdam and the UDC. Performance for the 40 questions searched with UDC and RAE was 50 per cent, compared with 85 per cent for Uniterms, while performance for the NLL language using another set of questions was 88 per cent as opposed to 80 per cent for the UDC on a subset of these. This group of tests, though defective for example in using different question and document sets, already exhibits some key features of the Cranfield tradition: methodologically the use of source document queries, substantively an interest in subject-oriented indexing. Thorne's account of the set of tests also emphasizes costs, which have been one of Cleverdon's persistent concerns. The part played by Cleverdon's experience in these tests in promoting Cranfield 1 is clearly indicated in the Preface to the first volume of the Cranfield 1 Report2. Thus Cleverdon notes that while the specialized NLL scheme developed by 1953 would apparently complement the more general UDC satisfactorily, something cheaper was desirable. Uniterms were a possibility, but the results obtained from a second test following the one just described were not too promising. However the work on the NLL scheme itself meant that test procedures had to be devised and, as Cleverdon says, `by this time [1954] I had become convinced that the only way to obtain a valid comparison between systems would be to control conditions in such a way that there was an economic basis for the comparison. At the Conference of the Aslib Aeronautical Group in 1955 I read a paper in which, for the first time, the necessity for controlled experiments was put forward.' (p. ii) This conference was, moreover, partly sponsored by the Classification Research Group, which was actively discussing novel classification tech- niques; and new approaches to indexing and retrieval were also being put forward in the United States. `It was clear', Cleverdon continues, `that claims were being made by proponents which, while possibly correct, could not be considered proven by results; just as clearly many of the arguments being used by opponents of the systems were equally unproven or trivial. It seemed desirable that a serious investigation should be made