IRE Information Retrieval Experiment The Cranfield tests chapter Karen Sparck Jones Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. r Criticisms of Cranfield 2 279 `is so controversial and so unexpected that it is bound to throw considerable doubt on the methods which have been used to obtain [the] results, and our own first reaction was to doubt the evidence. A complete recheck has failed to reveal any discrepancies, and unless one is prepared to say that the whole test conception is so much at fault that the results are completely distorted, then there is no other course except to attempt to explain the results which seem to offend against every canon on which we were trained as librarians.' (p.252) They conclude that it cannot be said that the subject area could not have distorted the results, but that collection size did not, that the relation between question and cited relevant documents is unlikely to have affected the results, that indexing failures or omissions were highly unlikely to have occurred sufficiently to have influenced the results, and that the classifications used were well prepared and so unlikely to have had any untoward effect. They conclude that, `with the possible doubtful exception of the subject field, there appears to be nothing in the test environment which could be held responsible for serious distortion of the results as between one system and another' (p.262), and continue, `this test has shown that natural language, with the slight modifications of confounding synonyms and word forms, combined with simple coordina- tion, can give a reasonable performance. This means that, based on such practice, a norm could be established for operational performance in any subject field, and it would then be for those who proposed new thesauri, new relational groups, links, or roles, to show how the use of their techniques would improve on the norm.' (p. 263b) 13.4 Criticisms of Cranfield 2 Unfortunately, though it is evident that Cranfield 2 was much more carefully designed than Cranfield 1, it was still open to methodological criticisms. Some of these were made by Vickery20, who points out that the unexpected conclusions make it especially necessary to examine how the results were obtained. Thus he notes, for example, that the indexes were made for the document set vocabulary, and so certainly do not reflect an ordinary operational situation; that though the vocabulary distributions may have the same shape as those for larger document sets, absolute numbers of postings are low, again not reflecting an operational situation; that the search terms are less likely to be of varying subject generality than those of `real' vocabularies; that the search broadening was very artificial; and finally, representing a methodological as much as substantive problem, that there could well be unusually close verbal links between relevant documents and queries. Vickery also comments on the lack of statistical significance tests, and