CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Comments
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 113 -
CHAPTER 8
Comments
This report has attempted to outline the reasoning and the procedures adopted
in the second Cranfield project. It could be argued that comments on these matters
should await the publication of the test results, but it is felt more appropriate to
conclude this volume by briefly considering some of the short-comings of the design
and the techniques used, and showing how the results might possibly be affected.
In Chapter 2, some aspects of the test design were considered from the viewpoint
of the decisions which seemed correct in 1961, at the time when the project was
prepared. While the test results and the conclusions which can be derived from them
wil] show to what extent the test design is such as to allow the objectives to be achieved,
there are certain matters which can be discussed immediately.
The original proposal suggested a collection of i, 200 documents with some 300
questions to be used for searching. For no very good reason, the total of documents
in the collection was increased to 1,400; while there would have been no difficulty in
finding 300 usable search questions from the 641 that were submitted, only 279 questions
were used, and, for most of the tests, this number effectively was reduced to 221. The
amount of data which has been obtained from this question-document set is vast, and
is more than sufficient for validation of the test results. It can at present only be a
matter for discussion as to whether the question-document set was larger than necessary.
In many of the tests, sub-set.[OCRerr]e of the collection were used, sub-sets such as 200
documents and 42 questions. There is a double danger in the use of such comparatively
small sets; firstly that they will produce results which are unrepresentative, and
secondly that the performance measures will be seriously distorted. To consider the
latter point, investigating the effect of generality ratio was a p:[OCRerr]rt of the project and
although the matter is somewhat complex, it has been possible to work out the relation-
ship between the performa nee figures for varying generality ratios. This work is
reaching the stage where it can be applied in all situations, so this particular problem
need no longer create any difficulty in the use of a small collection.
Far more serious is the question of whether the collection size is large enough to
give valid results. It has to be rememb_[OCRerr]red that this investigation has been concerned
with only one variable, namely index language devices, and this is quite unlike the
situation in Cranfield I, in which additional variables were such matters as indexing
time, indexers, and type of document. The result is that a much smaller set than the
18,000 documents and 1,200 questions of Cranfield I was required and there does not
seem to be any doubt but that the collection of 1,400 document% was large enough for
the test. The experience at Cranfield and Harvard of working with a sub-set of 200
documents and 42 questions has produced some useful evidence on the question as to
whether the total collection was larger than necessary. With the knowledge that the sub-
set produced results very similar to those obtained with the co[OCRerr],[OCRerr]plete collection (when
due allowance is made for the generality ratio), it now seems possible that a smaller
collection would have served equally well. However, lacking this hind-sight knowledge,
it is very likely that the results obtained with a smaller collection would have been
subject to criticism which could not have been satisfactorily refuted.
The method of obtaining a document collection and a set of questions turned out to
be a perfectly satisfactory way of operating. The response from the authors of
research papers was remarkably good, and can be interpreted as showing that the