IRE Information Retrieval Experiment The Cranfield tests chapter Karen Sparck Jones Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Cranfield 2 275 exhaustivity and specificity, reflecting the language of the document. Variations in exhaustivity could be obtained by utilizing importance weights attached to the keywords of the base description, while sufficient specificity was achieved by allowing multi-keyword strings and phrases. Thus the basic description provided both `concepts' representing interfixed (linked) key- words and `themes' representing higher-level linked concepts, and also weights for the individual keywords. In other words the description embodied several precision devices, though not roles, which were found to be inapplicable (pp. 5[OCRerr]7), while recall devices were left for application at search time; the precision devices had clearly to be derived from the document itself, but they could be abandoned for study purposes. The initial indexing therefore supplied four languages, single terms, concepts, themes, an weighted keywords, which could of course be combined; recall devices could be naturally applied either to keywords or concepts; for the first these were represented by synonym confounding, word-form combining, both of these together, and three grades of hierarchical reduction, giving a total of eight different languages. The provision of the various types of word classification embodying the recall devices is described in considerable detail. In addition, since these languages were all based on the initial natural language, a conventional controlled language index based on the EJC Thesaurus was used. The title and abstract tests, concurrently being studied by Salton, were regarded as representing other languages. Altogether, as Chapter 1 of Volume 2 of the Report shows, the various types and combinations of device applied to the three starting points of single terms, concepts, and controlled terms gave eight languages for the first, 15 for the second, and six for the third, i.e. 29 languages in total, to be tested. An interesting point about the test, made by Michael Keen (personal communication), is that the original test design described in Cleverdon and Mills assumed that the initial concept indexing would be so done as to allow 100 per cent recall; however the procedure for checking on this was not followed, suggesting that the idea that system imperfections could be ruled out by experimental procedures was tacitly accepted as unrealistic and was replaced by a principle that care should be taken in indexing, though perfection could not be attained. The actual conduct of the searching presented many problems, given the many options to be tested, the absence of convenient computer facilities, and the requirement that the physical form of the index should not interfere with its use. The description of the heroic clerical procedures involved, centring on the delightfully-named `beehive' cabinet, makes interesting reading, and it is worth noticing that even with a computer, the oiganization of the range of searches involved in Cranfield 2 would be non-trivial. The first volume of the Report shows the basis for the Cranfield 2 test, i.e. the type of indexing and index language hypotheses involved, and the approaches adopted to providing the test vehicle. The relationship of the primary test variables to others is summarized in Volume 2 (Ref. 19), as a preliminary to the discussion of the results. Thus the Report authors distinguish four classes of retrieval system factor[OCRerr]nvironmental, including subject field and collection size; software, namely indexing, with respect to exhaustivity, language, with respect to specificity, and searching; operational, including time, personnel, etc., and also performance; and hardware, for