Information Retrieval Experiment

IRE Information Retrieval Experiment The Cranfield tests chapter Karen Sparck Jones Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Cranfield 1 265 relevance figure Will be accepted. In other cases recall is less important, and greater emphasis will be placed on improved relevance.' (p.90) Finally, in considering recall and precision, it is `necessary to consider the environment in which [a] system is operated and here the most important factor is the type of question which will be put to it.' (p.100) Working back to indexing this bears on questions of the exhaustivity of indexing, its specificity, the provision for syntax, and weighting. Discussing these, Cleverdon maintains that `it remains true that given the same concept indexing, any two descriptor languages will have the same information content, and therefore the same potentiality for retrieval.' (p.104) In other words, `it is not the alternatives of classified or alphabetical arrangement, of post- co-ordinate or pre-co-ordinate indexing (much less the alternatives of manual or mechanical searching) which make any real difference in performance but the power of the descriptor language, allied to the standard of the indexing. The "power" of a descriptor language is in its ability to eliminate irrelevant references, and in addition to a hospitality for specific indexing, there are at least two other devices which can be used, namely "syntactic indexing" and "weighted indexing".' (p.105) Parallel tests Cranfield 1 was paralleled by two investigations of existing systems designed to throw light on the extent to which its results were influenced by the artificialities of the test design. One study was of the facetted system set up for the English Electric library at Whetstone (see Cleverdon3, Chapter 7). This test again involved searching for source documents on which questions were based, by both Cranfield project staff and, for a subset of the queries, by English Electric staff; the success rates for each, 77.4 per cent and 73.5 per cent, and the reasons for failures, paralleled those of Cranfield 1, and the same major problems of preferred order for the chain index were encountered. The conclusion was that the test methods developed at Cranfield were applicable in other environments, and perhaps that the results obtained in the tests represent the level of performance to be expected. The second test was the joint Cranfield-WRU (Cranfield 1[OCRerr] test of the WRU metallurgical index4, intended primarily as a study of testing techniques rather than as an evaluation of the index, at that time incomplete. In the test the WRU system, regarded at the time of the test as one of the most sophisticated novel approaches to indexing, was compared with facet indexing for 114 questions, again based on source documents, searched over 950 documents. Since evaluation simply by searching for source documents (though in this case without the wanted document numbers being known to the searchers) had been criticized, the test included an exhaustive assessment of other documents for relevance and the calculation of recall and precision