CRANV1P1 ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text Test Design chapter Cyril Cleverdon Jack Mills Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 10 gone according to plan. The final stage was intended to be a comparison of the out- put of the two sets of searches, in order to find which system had been successful in obtaining more relevant documents. The problem which arose at this final stage was that neither group was willing to accept the relevance assessments of the other group; rumour has it that at the end of the second day of discussion, the two groups were still arguing about the meaning of the first search question. No real blame can be fixed on those who organised the test; in 1952 it was not unreasonable to think that two groups of intelli- gent people would, without serious difficulty, be able to come to an amicable agreement as to which documents were relevant to a particular question. If any fault can be found, it only lies in the failure to make generally available either of the two reports which are said to have been prepared by the two groups taking part in the test. The only published account was a brief paper by Gull which appeared some years later in American Documentation (reference 10), Bnd which dealt mainly with the results of the searches. Gull does, however, make the following very apt comment: "When one considers that a fairly thorough search of the literature indicates that this compari- son of two reference systems is the first undertaken so far, it is not surprising that the results revealed clerical errors and an incomplete design of the test. " With the exception of a small test done in 1953 by Cleverdon and Thorne (ref. 11), this had been the only test of an I. R. system carried out before the test design for Cranfield I was prepared in 1956. While access to the complete reports of the ASTIA -Uniterm test might have revealed some more information, the only positive fact known in 1956 concerning test design of I.B. systems was that failure to have a firm agreement on question-document relevance could result in complete failure to realise the test objectives. Concerning information retrieval systems, however, nothing was known for certain. For any belief categorically stated by one expert, it was possible to find the exact opposite stated by another expert. Those were , in fact, the halycon days when one could argue all night without producing a shred of evidence for one's views, when Metcaife , for instance, could write a fascinating book (ref. 12) proving in three hundred pages that an alphabetical subject catalogue was vastly superior to a classified catalogue without having to, or being able to, present one piece of experimental data to support any of his many assertions. The field of investigation for Cranfield I was therefore wide open, in the sense that it would prove or disprove some conflicting beliefs. Since it was uncertain as to what was of major importance, the decision was deliberately taken to plan the test over a wide range of aspects. Nc-t only index languages but qualifications of indexers, indexing time, categories of documents, search tactics and search capability, optimi- stically (over-optimistically some might argue) all were incorporated in the test design. Any knowledge would be new knowledge and there was practically no limit to what could be attempted, although there were certainly definite but unknown limits as to what could be achieved. From a personal viewpoint, however, one limitation was essential in the design; actual questions could not be used if these involved relevance assessments by other people than the questioners. This restriction had to be accepted, and the result was the adoption of the technique of using prepared