Information Retrieval Experiment

IRE Information Retrieval Experiment The Cranfield tests chapter Karen Sparck Jones Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Cranfield 2 277 detailed figures and graphs presented in Volume 2. However since relying only on one form of measure might be dangerous, fallout figures were worked for the main runs; and an alternative representation of recall and precision using ranking rather than levels, the so-called `document output cutoff' method, was also supplied for the main run outputs. The discussion of these extremely difficult issues in the Report is important both in showin[OCRerr] the attention paid to the question by the project, and in emphasizing their intractibility for any project. It is impossible here to do more than refer briefly to the great mass of individual results presented in Volume 2: in providing this detail for reader study the Cranfield 2 Report is much superior to that for Cranfield 1. It is sufficient to note that the main results fall into 9 groups: the first group (4.1) gives performance for the 221 x 1400 and 42 x 200 collections for several single term languages, for the authors supporting their view that the smaller collection could justifiably be used for most of the experimeilts; the second (4.2) compares all the recall devices for single terms for the 42 questions and 200 documents, showing some loss of performance with the most gross term reduction; group 4.3 tests concepts and themes, for small query sets but 1400 documents, showing not much difference in performance; group 4.4 examines exhaustivity levels with single terms, for both large and small collections, again showing not much variation in performance; group 4.5 studies search rules for the single term languages, for small query sets but 1400 documents, suggesting some superiority in the more stringent strategies; the concept languages compared for the 42 x 200 collection in 4.7 show large performance variations, and this is also true of the controlled languages compared in 4.8 for this collection; abstracts and titles regarded as indexing languages are compared in 4.9, again for the 42 x 200 collection, showing abstracts inferior. Section 4.6 covers a secondary variable comparison on the different relevance grades, for the single term languages. To obtain an overview of the co-ordination level results some comparisons are made of performance at specific co[OCRerr]ordination levels: for example for the 42 x 200 collection at co[OCRerr]ordination level 3 (Figure 6. lOT), the various single term languages range from recall 66.7 per cent with precision 14.8 per cent for the simplest language to 82.3 per cent and 7.4 per cent for the most `condensed' hierarchical one. For co-ordination 2 for the concept languages (Figure 6.1 2T), deemed comparable with level 3 for the simple terms, performance ranges from recall 84.8 per cent with precision 6.1 per cent for the most condensed to recall 14.1 per cent with precision 50.9 per cent for the given basic indexing language, while for controlled indexing at level 2 (Figure 6. 14T) performance ranges from 68.7 per cent with 12.6 per cent precision for the basic to 94.4 per cent and 5.1 per cent recall and precision for the most condensed descriptions. The picture is of low recall and high precision for concepts, higher recall and lower precision for single terms, and highest recall and lowest precision for controlled. Comparing the graphs for the most basic members of the three classes shows single terms and controlled very similar, with concepts with very much lower recall (Figure 6.1 P); however when the best members of each class are taken performance is very similar, with single terms probably superior to controlled and definitely superior to concepts (Figure 6.2P). The main aim of the alternative document output cutoff representation