CRANV1P1 ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text Test Design chapter Cyril Cleverdon Jack Mills Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 13 used the term 'relevance' with this meaning, it is immediately apparent that the whole argument is defective. The argument in the paper starts with a quotation from a Cranfield paper written before this decision to change to the term "precision ratio' had been taken. Substituting this term, but not in any way changing the original mean- ing we would now have written, "With the aid of the set of documents and the set of questions [for which the document/question relevance assessments have been pre- viously made by the questioner] it will be possible to test each index language device in turn and so get precise figures for the effect on recall and precision ratios. " Taube's comment on this was 'some way or another a vague or hardly recognisable and admittedly difficult notion [i.e. relevance] has turned out to be precisely measur- able". It is not, of course, relevance which is being measured, but the decisions regarding relevance which have already been taken. As Salton says (Ref.14), "once acceptable relevance judgements are available for all documents with respect to all search requests, the calculation of recall and precision becomes perfectly straight- forward and unambiguous." It is interesting to find, in the issue of American Documentation for April 1965, that there is a brief note (ref. 15) by two members of the staff of Documentation Inc., in which they discuss a NASA Search System Analysis Sheet. The example which they presented has been reproduced on page 12, and from this it can be seen that these members of the staff of Documentation Inc. have been able to derive, for this particu- lar search, an acceptance rate (i.e. precision ratio or relevance ratio) of 86.5%.* It is interesting to note that, on the Analysis Sheet, the phrase used is 'accepted hits after editing'. This implies that the determination of the relevance of the document to the question has been by a member of the staff of Documentation Inc. , and his stan- dard for relevance might be very different from that of the questioner. This leads us back to the point we had reached before the diversion to consider briefly the matter of relevance. As we argued earlier, there were sound, compelling reasons for the use of source-document questions in Cranfield I, because they gave, simply and economically unequivocal relevance assessments. More particularly, it still remains probably the most effective and economical method of establishing the general recall ratio in many test situations. By 1961, however, it was quite unacceptable for an experimental in- vestigation of the type we had in mind. What were the alternatives? These can most simply be tabulated under various aspects as follows. Types of search questions 1. An actual question that is put to an information retrieval system and searched at the time it is required. 2. An actual question that has been put to an I.B. system. In other words, one obtains questions that have been used previously, either with the system being tested or some other system. *To save misunderstanding, we would point out that an error has been made in cal- culating this figure. It should, of course, be 85.9%.