IRE
Information Retrieval Experiment
The Cranfield tests
chapter
Karen Sparck Jones
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
r
Criticisms of Cranfield 2 279
`is so controversial and so unexpected that it is bound to throw considerable
doubt on the methods which have been used to obtain [the] results, and our
own first reaction was to doubt the evidence. A complete recheck has
failed to reveal any discrepancies, and unless one is prepared to say that
the whole test conception is so much at fault that the results are completely
distorted, then there is no other course except to attempt to explain the
results which seem to offend against every canon on which we were trained
as librarians.' (p.252)
They conclude that it cannot be said that the subject area could not have
distorted the results, but that collection size did not, that the relation between
question and cited relevant documents is unlikely to have affected the results,
that indexing failures or omissions were highly unlikely to have occurred
sufficiently to have influenced the results, and that the classifications used
were well prepared and so unlikely to have had any untoward effect.
They conclude that,
`with the possible doubtful exception of the subject field, there appears to
be nothing in the test environment which could be held responsible for
serious distortion of the results as between one system and another'
(p.262),
and continue,
`this test has shown that natural language, with the slight modifications of
confounding synonyms and word forms, combined with simple coordina-
tion, can give a reasonable performance. This means that, based on such
practice, a norm could be established for operational performance in any
subject field, and it would then be for those who proposed new thesauri,
new relational groups, links, or roles, to show how the use of their
techniques would improve on the norm.' (p. 263b)
13.4 Criticisms of Cranfield 2
Unfortunately, though it is evident that Cranfield 2 was much more carefully
designed than Cranfield 1, it was still open to methodological criticisms.
Some of these were made by Vickery20, who points out that the unexpected
conclusions make it especially necessary to examine how the results were
obtained. Thus he notes, for example, that the indexes were made for the
document set vocabulary, and so certainly do not reflect an ordinary
operational situation; that though the vocabulary distributions may have the
same shape as those for larger document sets, absolute numbers of postings
are low, again not reflecting an operational situation; that the search terms
are less likely to be of varying subject generality than those of `real'
vocabularies; that the search broadening was very artificial; and finally,
representing a methodological as much as substantive problem, that there
could well be unusually close verbal links between relevant documents and
queries.
Vickery also comments on the lack of statistical significance tests, and