IRE
Information Retrieval Experiment
The Cranfield tests
chapter
Karen Sparck Jones
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
258 The Cranfield tests
so that opposing claims could be evaluated, and by this time we had
definite views as to how such an investigation could be carried out.' (p. ii)
In a paper to a Special Libraries Association meeting in 1955 Cleverdon
argued that independent evaluation of rival claims was needed, and this led
to National Science Foundation funding of the Aslib Cranfield Project in
1957.
The project was designed to investigate `the comparative efficiency of four
indexing systems', involving the indexing of:
`18,000 research reports and periodical articles in the general field of
aeronautical engineering, with half of the documents dealing with the
specialized subject of high speed aerodynamics.' (p 1)
The general objectives of the test were determined by the problems presented
by the growth of the scientific literature, the increasing complexity of
research work, and corresponding proposals for new retrieval systems and
implementations. As the project proposal stated,
`in all the controversies that have raged during the past fifty years on the
basic points of a book catalogue or card catalogue, with an alphabetical
subject arrangement or a classified arrangement, it is interesting to note
that no attempt has been made to carry out any controlled tests that would
enable one to make statements based on fact rather than voice theoretical
opinions.' (p.4)
The proposal quotes a remark by the Editor of American Documentation in
1955 to the effect that we must regard
`documentation Systems as useful devices, the benefits of which must be
determined, not bypolemks, but by the intelligent measurement ofsuch benefits
in relation to needs and costs.' (p.5)
As the proposal noted,
`the complication in attempting to evaluate the comparative efficiency of
any two retrieval systems is due to the number of various factors which
have to be considered. These can be summarised as follows:
(1) The documents which are to be indexed.
(2) The system of indexing.
(3) The indexer's subject knowledge of the documents being indexed.
(4) The indexer's familiarity with the indexing system.
(5) The size of the index.
(6) The type of question which is to be put to the index.
(7) The equipment to be used in recording or retrieving data.
(8) The overall efficiency, which is made up of:
(a) The time cost in preparing the index.
(b) The time cost in locating required information.
(c) The cost of equipment used.
(d) The probability of producing the required answer.
(e) The absence of irrelevant answers (`noise').
(f) The number of searches made.' (p.5)
The essential features of the project were thus that it was a comparative