IRE
Information Retrieval Experiment
The Cranfield tests
chapter
Karen Sparck Jones
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
272 The Cranfield tests
of the points made stemmed from a failure to appreciate the character of the
test as a controlled experiment, others were sound and were explicitly or
implicitly seen to be so and hence were catered for in the planning and
conduct of Cranfield 2. Even so, the critics agreed on the significance of
Cranfield 1 as a retrieval experiment: though its results were not readily
accepted, its status as, in Michael Keen's words, a `pioneering and relevant'
test was recognized not only subsequently, but at the time.
The influence of the Aslib Cranfield Project work in the first half of the
1 960s can be seen both in specific tests like Herner and Co.'s Bureau of Ships
investigation1 1, and, more broadly, in the application of particular lessons to
be learnt from it in retrieval system testing in general. From the system point
of view it suggested that the indexing language might be less important, and
other factors more important, than had been supposed, while from the
methodological point of view it stimulated more careful design, in terms both
of control and realism. For measuring system performance it did much to
promote the use of recall and precision. That such lessons were learnt from
the Cranfield research is clear from discussions of system testing like
Kyle's'2: she explicitly asks `What have we learnt?' and `Where do we go
from here?', and seeks to provide some answers. Some of the Case Western
Reserve University research'3 was also a direct response to the Cranfield
work, as Rees indicates'4' [OCRerr] More generally, Cranfield 1 and l[OCRerr] led to a great
deal of discussion of retrieval systems and their testing, illustrated by
Cleverdon's argument with Swanson about the Cranfield hypotheses'6, and
by the debate at the FID/CR Conference'7 in 1964. The wider influence of
the Cranfield experiments on system evaluation at a time when this was
developing, especially in the context of system automation, was therefore
considerable.
13.3 Cranfleld 2
However the major impact of Cranfield 1 and its associated experiments was
on Cranfield 2, which was specifically designed as a development of
Cranfield 1: this is clear from Cleverdon and Mills' account'8 of the
philosophy underlying Cranfield 2. Cleverdon, Mills and Keen's view in the
first volume of the Cranfield 2 Report19 was that while Cranfield 1 and 1+
were of general value in demolishing preconceptions about indexing
languages, in showing that operational systems could be readily evaluated, in
providing considerable data, and in encouraging discussions of systems and
their evaluation, they also led to specific hypotheses which were taken as the
basis for the new study. These were seven of Swanson's, namely that 4 mm
for indexing is enough, that technical knowledge is not required, that systems
operate at 70-90 per cent recall and 8-20 per cent precision, that there is an
optimum level of exhaustivity, that there is an inverse relation between recall
and precision, that raising precision 1 per cent lowers recall 3 per cent, and
that the most significant Cranfield 1 result was that the four languages
performed the same, plux six more: these were that the most important
factors to be measured in system evaluation are recall and precision; that the
physical form of the store has no effect on performance so measured; that for
the same concept indexing different languages will perform much the same;