CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Test Design
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 9
Chapter 2
TEST DESIGN
There has been a considerable amount of comment during the past few years about
test design in general and the test design for Cranfield I in particular. That much of
this has been, unfortunately, misinformed has been due both to a failure to appreciate
the basic problems and purposes of an evaluation test, and also to a failure to dis-
tinguish between two main types of testing.
The first type of testing is that which is concerned with the evaluation of an opera-
tional information retrieval system, a sub-system of an operational system or a system
or sub-system proposed for an operational system. In all such cases, there is no
basic intention of advancing knowledge concerning information retrieval systems in
general, although in the present state of fragmentary knowledge, this may well be a
by-product. Basically such a test is designed to provide data for an analysis to be
made which will show how the system can work more efficiently either in regard to
operational or economic factors, in supplying the particular requirements of a given
body of users. Such a test was that performed by Lancaster on the index of the Bureau
of Ships (reference 5). Well designed on the basic Cranfield test procedure, with
defined limited objectives, it produced, economically and quickly, data which enabled
decisions to be taken on the optimum methods for the information retrieval system at
the Bureau of Ships. As a 'research' pay-off, it revealed yet another situation where
the use of roles was economically inefficient and operationally of doubtful value, and
added to the growing body of data on the problems created by the use of roles of the
type proposed by the Engineers Joint Council, in the Thesaurus of Engineering Terms.
There are many different variations of this type of test situation. One can, for
instance, devise a new system or sub-system and test it while it is still comparatively
small as effectively as one can test the performance of a long-established operational
system, but the characteristic of all such tests is that they are made with a given
situation in mind, their parameters are fixed by the pre-determined environment of the
system being evaluated.
The second type of test - the type with which this report is concerned - is where
one is dealing with an experimental situation. In such a case, the purpose of the test
is to advance knowledge in some aspect of information retrieval without any particular
operational requirement in mind. For this to be done, it is necessary to advance from
a firm foundation of what is known. To make such an advance may require the use
of unproved techniques, and, since the attempt is being made to investigate the unknown,
there is always the possibility that, however meticulously the test has been designed,
some unexpected factor will interfere with the objective of the test. If such a factor
can be recognised early enough, it may be possible to adjust the design to take account
of the new situation, but the risk has to be accepted that the weakness may only become
apparent towards the end of the test.
A classical example of such a situation was the test carried out by Documentation Inc.
Inc. , where the objective was to compare the performance of a Uniterm index and the
alphabetical subject catalogue compiled by the Armed Services Technical Information
Agency. The first stage of the test involved the indexing of 15,000 documents by the
Uniterm system, at the same time as they were also being indexed by the ASTIA staff.
The second stage was for the two groups to carry out searches in their indexes for
some ninety odd tjuestions and then for each group to analyse the output of their searches
to find which documents were relevant. Up to this point, everything appears to have