CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Test Design
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
11
questions based on source documents. Although this technique has been strongly
attacked in many papers, no-one has suggested any other method which would
have permitted so much reliable data to be obtained so economically.* However
by the time the design of the present project was being considered, the position
had changed radically. The conclusions coming from Cranfield I, supported by
other smaller investigations, had deliniated more sharply the problem areas for
investigation; equally important was the realization that progress would be dependent
on the use of more refined test methodology.
AB oui.lx-i.ed in the previous chapters, the new project was to deal with index
language devices; the first objective was the precise measurement of recall and pre-
cision ratios. The essential prerequisite to obtaining these measures (in an experi-
mental situation) is the determination of the sets of documents which are and are not
relevant to each of a set of test questions. Before proceeding to discuss the various
ways of determining this matter, it may be helpful to consider a recent paper by the
late Dr. Taube 'The pseudo-mathematics of relevance' (ref.13), which is being widely
quoted as discrediting the results of the Cranfield investigations.
Any paper by Dr. Taube merited serious consideration, and in particular any
paper dealing with the question of relevance, since this was the critical problem in
the original test carried out by Documentation Inc. While the paper presents what
at first sight appears to be a plausible argument, it is, in fact, based upon a con-
fusion and distortion of meaning of two uses of the term'relevance'. First there
is the use of the term on its own where it denotes, in a true life situation, the subjec-
tive assessment of an individual in relation to a document or a set of documents which
he receives in answer to a search question, so that he says "these documents are
relevant to my questions, those other documents are not relevant". The second use
of the term is in 'relevance ratio', which is the manner of expressing the proportion
of relevant documents retrieved to the total of documents retrieved in a search. As
such, 'relevance ratio' has nothing to do with the determination of relevance, but
merely involves a numerical calculation of those documents which have been previously
allocated to one of the two sets of relevant and not relevant.
At a meeting in Washington in 1964 of a group of some thirty people concerned,
to a greater or lesser degree, with evaluation of I.R. systems, the paper in question,
(which was originally written in March 1964) was amongst the documents circulated.
Since it was clear from the discussion that Dr. Taube was still confusing the two
meanings, Cleverdon agreed that in future we would cease to use the term 'rele-
vance ratio' and substitute another term. Possible alternatives were 'acceptance
rate' or 'precision ratio', both of which were being used by other groups with
the same meaning as 'relevance ratio'. As stated earlier, 'precision ratio'
was selected, and if one substitutes this term in those cases where Taube
*In these days when large grants are common for small investigations, it is of inter-
est to recall that the five years' work of Cranfield I, including the test of the Metal-
lurgical Index of Western Reserve University, was covered by two grants from the
National Science Foundation, totalling $44, 000.