Information Retrieval Experiment

IRE Information Retrieval Experiment The Cranfield tests chapter Karen Sparck Jones Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 272 The Cranfield tests of the points made stemmed from a failure to appreciate the character of the test as a controlled experiment, others were sound and were explicitly or implicitly seen to be so and hence were catered for in the planning and conduct of Cranfield 2. Even so, the critics agreed on the significance of Cranfield 1 as a retrieval experiment: though its results were not readily accepted, its status as, in Michael Keen's words, a `pioneering and relevant' test was recognized not only subsequently, but at the time. The influence of the Aslib Cranfield Project work in the first half of the 1 960s can be seen both in specific tests like Herner and Co.'s Bureau of Ships investigation1 1, and, more broadly, in the application of particular lessons to be learnt from it in retrieval system testing in general. From the system point of view it suggested that the indexing language might be less important, and other factors more important, than had been supposed, while from the methodological point of view it stimulated more careful design, in terms both of control and realism. For measuring system performance it did much to promote the use of recall and precision. That such lessons were learnt from the Cranfield research is clear from discussions of system testing like Kyle's'2: she explicitly asks `What have we learnt?' and `Where do we go from here?', and seeks to provide some answers. Some of the Case Western Reserve University research'3 was also a direct response to the Cranfield work, as Rees indicates'4' [OCRerr] More generally, Cranfield 1 and l[OCRerr] led to a great deal of discussion of retrieval systems and their testing, illustrated by Cleverdon's argument with Swanson about the Cranfield hypotheses'6, and by the debate at the FID/CR Conference'7 in 1964. The wider influence of the Cranfield experiments on system evaluation at a time when this was developing, especially in the context of system automation, was therefore considerable. 13.3 Cranfleld 2 However the major impact of Cranfield 1 and its associated experiments was on Cranfield 2, which was specifically designed as a development of Cranfield 1: this is clear from Cleverdon and Mills' account'8 of the philosophy underlying Cranfield 2. Cleverdon, Mills and Keen's view in the first volume of the Cranfield 2 Report19 was that while Cranfield 1 and 1+ were of general value in demolishing preconceptions about indexing languages, in showing that operational systems could be readily evaluated, in providing considerable data, and in encouraging discussions of systems and their evaluation, they also led to specific hypotheses which were taken as the basis for the new study. These were seven of Swanson's, namely that 4 mm for indexing is enough, that technical knowledge is not required, that systems operate at 70-90 per cent recall and 8-20 per cent precision, that there is an optimum level of exhaustivity, that there is an inverse relation between recall and precision, that raising precision 1 per cent lowers recall 3 per cent, and that the most significant Cranfield 1 result was that the four languages performed the same, plux six more: these were that the most important factors to be measured in system evaluation are recall and precision; that the physical form of the store has no effect on performance so measured; that for the same concept indexing different languages will perform much the same;