CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Test Design
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
13
used the term 'relevance' with this meaning, it is immediately apparent that the whole
argument is defective. The argument in the paper starts with a quotation from a
Cranfield paper written before this decision to change to the term "precision ratio'
had been taken. Substituting this term, but not in any way changing the original mean-
ing we would now have written, "With the aid of the set of documents and the set of
questions [for which the document/question relevance assessments have been pre-
viously made by the questioner] it will be possible to test each index language device
in turn and so get precise figures for the effect on recall and precision ratios. "
Taube's comment on this was 'some way or another a vague or hardly recognisable
and admittedly difficult notion [i.e. relevance] has turned out to be precisely measur-
able". It is not, of course, relevance which is being measured, but the decisions
regarding relevance which have already been taken. As Salton says (Ref.14), "once
acceptable relevance judgements are available for all documents with respect to all
search requests, the calculation of recall and precision becomes perfectly straight-
forward and unambiguous."
It is interesting to find, in the issue of American Documentation for April 1965,
that there is a brief note (ref. 15) by two members of the staff of Documentation Inc.,
in which they discuss a NASA Search System Analysis Sheet. The example which
they presented has been reproduced on page 12, and from this it can be seen that these
members of the staff of Documentation Inc. have been able to derive, for this particu-
lar search, an acceptance rate (i.e. precision ratio or relevance ratio) of 86.5%.*
It is interesting to note that, on the Analysis Sheet, the phrase used is 'accepted hits
after editing'. This implies that the determination of the relevance of the document
to the question has been by a member of the staff of Documentation Inc. , and his stan-
dard for relevance might be very different from that of the questioner. This leads us
back to the point we had reached before the diversion to consider briefly the matter of
relevance. As we argued earlier, there were sound, compelling reasons for the use
of source-document questions in Cranfield I, because they gave, simply and economically
unequivocal relevance assessments. More particularly, it still remains probably the
most effective and economical method of establishing the general recall ratio in many
test situations. By 1961, however, it was quite unacceptable for an experimental in-
vestigation of the type we had in mind. What were the alternatives? These can most
simply be tabulated under various aspects as follows.
Types of search questions
1. An actual question that is put to an information retrieval system and searched at
the time it is required.
2. An actual question that has been put to an I.B. system. In other words, one obtains
questions that have been used previously, either with the system being tested or some
other system.
*To save misunderstanding, we would point out that an error has been made in cal-
culating this figure. It should, of course, be 85.9%.