IRE
Information Retrieval Experiment
The Cranfield tests
chapter
Karen Sparck Jones
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Cranfield 1 265
relevance figure Will be accepted. In other cases recall is less important,
and greater emphasis will be placed on improved relevance.' (p.90)
Finally, in considering recall and precision, it is
`necessary to consider the environment in which [a] system is operated and
here the most important factor is the type of question which will be put to
it.' (p.100)
Working back to indexing this bears on questions of the exhaustivity of
indexing, its specificity, the provision for syntax, and weighting. Discussing
these, Cleverdon maintains that
`it remains true that given the same concept indexing, any two descriptor
languages will have the same information content, and therefore the same
potentiality for retrieval.' (p.104)
In other words,
`it is not the alternatives of classified or alphabetical arrangement, of post-
co-ordinate or pre-co-ordinate indexing (much less the alternatives of
manual or mechanical searching) which make any real difference in
performance but the power of the descriptor language, allied to the
standard of the indexing. The "power" of a descriptor language is in its
ability to eliminate irrelevant references, and in addition to a hospitality
for specific indexing, there are at least two other devices which can be
used, namely "syntactic indexing" and "weighted indexing".' (p.105)
Parallel tests
Cranfield 1 was paralleled by two investigations of existing systems designed
to throw light on the extent to which its results were influenced by the
artificialities of the test design.
One study was of the facetted system set up for the English Electric library
at Whetstone (see Cleverdon3, Chapter 7). This test again involved searching
for source documents on which questions were based, by both Cranfield
project staff and, for a subset of the queries, by English Electric staff; the
success rates for each, 77.4 per cent and 73.5 per cent, and the reasons for
failures, paralleled those of Cranfield 1, and the same major problems of
preferred order for the chain index were encountered. The conclusion was
that the test methods developed at Cranfield were applicable in other
environments, and perhaps that the results obtained in the tests represent the
level of performance to be expected.
The second test was the joint Cranfield-WRU (Cranfield 1[OCRerr] test of the
WRU metallurgical index4, intended primarily as a study of testing
techniques rather than as an evaluation of the index, at that time incomplete.
In the test the WRU system, regarded at the time of the test as one of the
most sophisticated novel approaches to indexing, was compared with facet
indexing for 114 questions, again based on source documents, searched over
950 documents. Since evaluation simply by searching for source documents
(though in this case without the wanted document numbers being known to
the searchers) had been criticized, the test included an exhaustive assessment
of other documents for relevance and the calculation of recall and precision