IRE Information Retrieval Experiment The Cranfield tests chapter Karen Sparck Jones Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 266 The Cranfield tests for these. The assessments allowed two grades of relevance, for documents as relevant as the source, and for less relevant documents. The results, for both grades and including the source documents, were 75.8 per cent recall for the WRU index, with 17.7 per cent precision, compared with recall of 69.5 per cent and precision of 33.7 per cent for the Cranfield facet index. (Removing the source documents from the calculations reduces recall to 70.6 per cent and 59.1 per cent respectively, with 13.0 per cent and 24.0 per cent precision (my figures using Ref. 4pp. 12-15 and Appendix 3C; there appear to be some discrepancies in the various figures in Ref. 4).) A detailed analysis of the failures showed the searching was responsible for most, namely 67.1 per cent, with indexing 18.4 per cent. In considering the results. Aitchison and Cleverdon say that `before the test started, we were convinced that W.R.U. would be able to achieve one of three results. (a) obtain a high recall figure; (b) obtain a high relevance [i.e. precision] figure; (c) obtain a recall figure and a relevance figure which would both be somewhat higher than that achieved with the Cranfield facet index. The high level of exhaustivity of the indexing and the complex semantic factoring in the index language gave them the ability to achieve (a); the specific index language, with the added controls would allow them to achieve (b); the combination of these factors could bring about (c).' (p.47) The poor WRU precision figure was then explained by two factors: `the main factor in W.R.U. failures to retrieve relevant documents was the relatively poor standard of many of their search programmes,' (p.48) while an investigation of non-relevant documents showed that `the high level of exhaustive indexing was partly to blame' (p.48) They comment on various features of the test emphasizing the inverse recall/precision relationship, and note that the test influenced the work on Cranfield 2, then under way, in sharpening the idea of index language device. Cranfleld 1 summarized The main summary account of Cranfield 1 was the Lancaster and Mills5 account of 1964, which also discussed the English Electric and Cranfield lj tests. The account emphasizes the need for the study of indexing itself rather than the manipulation of its products in searches, and comments on the critical role of the Aslib Cranfield project in this. In the Lancaster and Mills' view the most significant results were related to recall, showing its inverse relation to precision and comparable performance for the languages studied (including facets when implemented rather differently from the main test); for indexing times, showing a short time is good enough; for indexers, showing technical knowledge of the subject is not necessary; and for failures, showing inevitable human error to be important. Lancaster and Mills accept Cleverdon's conclusion that the `artificial' questions did not invalidate the results, and themselves conclude that they were not affected by other factors like the stopwatch indexing. In Lancaster and Mills' view the English