CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Test Design
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
10
gone according to plan. The final stage was intended to be a comparison of the out-
put of the two sets of searches, in order to find which system had been successful
in obtaining more relevant documents.
The problem which arose at this final stage was that neither group was willing
to accept the relevance assessments of the other group; rumour has it that at the
end of the second day of discussion, the two groups were still arguing about the
meaning of the first search question. No real blame can be fixed on those who
organised the test; in 1952 it was not unreasonable to think that two groups of intelli-
gent people would, without serious difficulty, be able to come to an amicable agreement
as to which documents were relevant to a particular question. If any fault can be
found, it only lies in the failure to make generally available either of the two reports
which are said to have been prepared by the two groups taking part in the test. The
only published account was a brief paper by Gull which appeared some years later in
American Documentation (reference 10), Bnd which dealt mainly with the results of
the searches. Gull does, however, make the following very apt comment: "When
one considers that a fairly thorough search of the literature indicates that this compari-
son of two reference systems is the first undertaken so far, it is not surprising
that the results revealed clerical errors and an incomplete design of the test. "
With the exception of a small test done in 1953 by Cleverdon and Thorne (ref. 11),
this had been the only test of an I. R. system carried out before the test design for
Cranfield I was prepared in 1956. While access to the complete reports of the
ASTIA -Uniterm test might have revealed some more information, the only positive
fact known in 1956 concerning test design of I.B. systems was that failure to have a
firm agreement on question-document relevance could result in complete failure to
realise the test objectives. Concerning information retrieval systems, however,
nothing was known for certain. For any belief categorically stated by one expert, it
was possible to find the exact opposite stated by another expert. Those were , in
fact, the halycon days when one could argue all night without producing a shred of
evidence for one's views, when Metcaife , for instance, could write a fascinating book
(ref. 12) proving in three hundred pages that an alphabetical subject catalogue was
vastly superior to a classified catalogue without having to, or being able to, present
one piece of experimental data to support any of his many assertions.
The field of investigation for Cranfield I was therefore wide open, in the sense
that it would prove or disprove some conflicting beliefs. Since it was uncertain as
to what was of major importance, the decision was deliberately taken to plan the test
over a wide range of aspects. Nc-t only index languages but qualifications of indexers,
indexing time, categories of documents, search tactics and search capability, optimi-
stically (over-optimistically some might argue) all were incorporated in the test
design. Any knowledge would be new knowledge and there was practically no limit
to what could be attempted, although there were certainly definite but unknown limits
as to what could be achieved. From a personal viewpoint, however, one limitation
was essential in the design; actual questions could not be used if these involved
relevance assessments by other people than the questioners. This restriction had
to be accepted, and the result was the adoption of the technique of using prepared