CRANV1P1 ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text Documents and Questions chapter Cyril Cleverdon Jack Mills Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 33 - the search programmes included every possible word, and a more conventional library search would be made on fewer terms than this, an average of probably 4 to 5. It is always difficult to prove that any set of questions is really typical, or average in some way, but since eacil of these questions is a statement of a real need for information that arose in the course of some 180 research projects, they are probably as typical a set as can be obtained outside a real life situation. Many of the questions may have been put to an information service at some stage. Without the facility to cross-examine the questioner, interpretation of the mean- ing gave less trouble than expected. A deep knowledge of tile subjects would probably have revealed some facts and connections not appreciated, but many replies to the second questionaire included additional search terms suggested by the authors, and in some cases alternative rephrased questions. An example of the intricacies of the subject is seen in the followhxg comment, made by an author to explain why one of the additionally submitted documents was not relevant to his question:- "It might seem strange that the paper by Kuchemann and Kettle would be of no use at all in answering my question. This is due to the fact that the influence of end plates is different for stream- lined and unstreamlined bodies. In the first case they modify the vortices shed from the tips whereas in the second case they prevent spanwise flow brought about by the blockage of the body. There is no con- nection between these two effects. " The test design has produced a set of documents which have been assessed as relevant to a set of questions. Since this has not been done in a real life situation, can it be argued that the questions are artificial and the match with the documents unreal? Considerable discussion and argument on [OCRerr]hese points has taken place in con- nection with the questions used in Cranfield I and the Western Reserve University test. Although the present question-gathering method did involve a base or 'source' document, it has not been used in the same way as in the previous tests. Previously the questions were framed so that the source document would be a complete answer to the question, but in the present test the question is the real need or research problem that gave rise to the 'source' document being written.[OCRerr] Although the !source' documents are included in the collection, it is only the cited documents from each 'source' document that are assessed and counted as relevant, with the addition of the extra relevant documents found. The 'source' document for each question is removed from the collection when that question is being tested and does not appear in any of the results at all. There is therefore, no reason for continuing to argue about the unreality of tests based on source document questions, or to continue to imply that the [OCRerr]ranfield test method' necessarily involves the use of such questions. However, we have stated a belief that source document questions 'can still be used satisfactorily in situations where time and cost are important considerations, as might be the case in an evaluation of a small operational information retrieval system'. This comment was given in a reply to an article by D. R. Swanson, on 'The Evidence Underlying the Cranfield Results' (Ref. 4), in which he emphasised what he called 'the artificial' or 'biased' nature of the relationship of the question to the