IRE
Information Retrieval Experiment
The Cranfield tests
chapter
Karen Sparck Jones
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Cranfield 2 275
exhaustivity and specificity, reflecting the language of the document.
Variations in exhaustivity could be obtained by utilizing importance weights
attached to the keywords of the base description, while sufficient specificity
was achieved by allowing multi-keyword strings and phrases. Thus the basic
description provided both `concepts' representing interfixed (linked) key-
words and `themes' representing higher-level linked concepts, and also
weights for the individual keywords. In other words the description embodied
several precision devices, though not roles, which were found to be
inapplicable (pp. 5[OCRerr]7), while recall devices were left for application at
search time; the precision devices had clearly to be derived from the
document itself, but they could be abandoned for study purposes. The initial
indexing therefore supplied four languages, single terms, concepts, themes,
an weighted keywords, which could of course be combined; recall devices
could be naturally applied either to keywords or concepts; for the first these
were represented by synonym confounding, word-form combining, both of
these together, and three grades of hierarchical reduction, giving a total of
eight different languages. The provision of the various types of word
classification embodying the recall devices is described in considerable detail.
In addition, since these languages were all based on the initial natural
language, a conventional controlled language index based on the EJC
Thesaurus was used. The title and abstract tests, concurrently being studied
by Salton, were regarded as representing other languages. Altogether, as
Chapter 1 of Volume 2 of the Report shows, the various types and
combinations of device applied to the three starting points of single terms,
concepts, and controlled terms gave eight languages for the first, 15 for the
second, and six for the third, i.e. 29 languages in total, to be tested.
An interesting point about the test, made by Michael Keen (personal
communication), is that the original test design described in Cleverdon and
Mills assumed that the initial concept indexing would be so done as to allow
100 per cent recall; however the procedure for checking on this was not
followed, suggesting that the idea that system imperfections could be ruled
out by experimental procedures was tacitly accepted as unrealistic and was
replaced by a principle that care should be taken in indexing, though
perfection could not be attained.
The actual conduct of the searching presented many problems, given the
many options to be tested, the absence of convenient computer facilities, and
the requirement that the physical form of the index should not interfere with
its use. The description of the heroic clerical procedures involved, centring
on the delightfully-named `beehive' cabinet, makes interesting reading, and
it is worth noticing that even with a computer, the oiganization of the range
of searches involved in Cranfield 2 would be non-trivial.
The first volume of the Report shows the basis for the Cranfield 2 test, i.e.
the type of indexing and index language hypotheses involved, and the
approaches adopted to providing the test vehicle. The relationship of the
primary test variables to others is summarized in Volume 2 (Ref. 19), as a
preliminary to the discussion of the results. Thus the Report authors
distinguish four classes of retrieval system factor[OCRerr]nvironmental, including
subject field and collection size; software, namely indexing, with respect to
exhaustivity, language, with respect to specificity, and searching; operational,
including time, personnel, etc., and also performance; and hardware, for