IRE
Information Retrieval Experiment
The Cranfield tests
chapter
Karen Sparck Jones
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Cranfield 1 257
tests in practice limited performance measurement, be interpreted as
referring primarily to effectiveness.
13.1 Cranfield 1
The ancestor of Cranfield 1 was a pilot experiment in the use of Uniterms for
indexing aeronautical documents which was carried out at Cranfield in 1953.
The data for the test consisted of 40 questions based on source documents
which were searched on 200 documents; performance was measured as the
success rate in retrieving the query sources, which was 83 per cent (strictly
82.5 per cent). The test formed part of a group described by Thorne1 in 1955
involving a comparison between Uniterms and the UDC as used at the
Royal Aircraft Establishment, and a subsequent comparison between the
NLL specialized aeronautical indexing language developed at National
Aeronautical Research Institute, Amsterdam and the UDC. Performance
for the 40 questions searched with UDC and RAE was 50 per cent, compared
with 85 per cent for Uniterms, while performance for the NLL language
using another set of questions was 88 per cent as opposed to 80 per cent for
the UDC on a subset of these.
This group of tests, though defective for example in using different
question and document sets, already exhibits some key features of the
Cranfield tradition: methodologically the use of source document queries,
substantively an interest in subject-oriented indexing. Thorne's account of
the set of tests also emphasizes costs, which have been one of Cleverdon's
persistent concerns.
The part played by Cleverdon's experience in these tests in promoting
Cranfield 1 is clearly indicated in the Preface to the first volume of the
Cranfield 1 Report2. Thus Cleverdon notes that while the specialized NLL
scheme developed by 1953 would apparently complement the more general
UDC satisfactorily, something cheaper was desirable. Uniterms were a
possibility, but the results obtained from a second test following the one just
described were not too promising. However the work on the NLL scheme
itself meant that test procedures had to be devised and, as Cleverdon says,
`by this time [1954] I had become convinced that the only way to obtain a
valid comparison between systems would be to control conditions in such
a way that there was an economic basis for the comparison. At the
Conference of the Aslib Aeronautical Group in 1955 I read a paper in
which, for the first time, the necessity for controlled experiments was put
forward.' (p. ii)
This conference was, moreover, partly sponsored by the Classification
Research Group, which was actively discussing novel classification tech-
niques; and new approaches to indexing and retrieval were also being put
forward in the United States. `It was clear', Cleverdon continues,
`that claims were being made by proponents which, while possibly correct,
could not be considered proven by results; just as clearly many of the
arguments being used by opponents of the systems were equally unproven
or trivial. It seemed desirable that a serious investigation should be made