CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Documents and Questions
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 33 -
the search programmes included every possible word, and a more conventional library
search would be made on fewer terms than this, an average of probably 4 to 5.
It is always difficult to prove that any set of questions is really typical, or
average in some way, but since eacil of these questions is a statement of a real need
for information that arose in the course of some 180 research projects, they are
probably as typical a set as can be obtained outside a real life situation. Many of the
questions may have been put to an information service at some stage.
Without the facility to cross-examine the questioner, interpretation of the mean-
ing gave less trouble than expected. A deep knowledge of tile subjects would probably
have revealed some facts and connections not appreciated, but many replies to the
second questionaire included additional search terms suggested by the authors, and
in some cases alternative rephrased questions. An example of the intricacies of the
subject is seen in the followhxg comment, made by an author to explain why one of
the additionally submitted documents was not relevant to his question:-
"It might seem strange that the paper by
Kuchemann and Kettle would be of no use at all in
answering my question. This is due to the fact
that the influence of end plates is different for stream-
lined and unstreamlined bodies. In the first case
they modify the vortices shed from the tips whereas
in the second case they prevent spanwise flow brought
about by the blockage of the body. There is no con-
nection between these two effects. "
The test design has produced a set of documents which have been assessed as
relevant to a set of questions. Since this has not been done in a real life situation,
can it be argued that the questions are artificial and the match with the documents
unreal?
Considerable discussion and argument on [OCRerr]hese points has taken place in con-
nection with the questions used in Cranfield I and the Western Reserve University
test. Although the present question-gathering method did involve a base or 'source'
document, it has not been used in the same way as in the previous tests. Previously
the questions were framed so that the source document would be a complete answer
to the question, but in the present test the question is the real need or research
problem that gave rise to the 'source' document being written.[OCRerr] Although the !source'
documents are included in the collection, it is only the cited documents from each
'source' document that are assessed and counted as relevant, with the addition of
the extra relevant documents found. The 'source' document for each question is
removed from the collection when that question is being tested and does not appear
in any of the results at all. There is therefore, no reason for continuing to argue
about the unreality of tests based on source document questions, or to continue to
imply that the [OCRerr]ranfield test method' necessarily involves the use of such questions.
However, we have stated a belief that source document questions 'can still be used
satisfactorily in situations where time and cost are important considerations, as might
be the case in an evaluation of a small operational information retrieval system'.
This comment was given in a reply to an article by D. R. Swanson, on 'The
Evidence Underlying the Cranfield Results' (Ref. 4), in which he emphasised what
he called 'the artificial' or 'biased' nature of the relationship of the question to the