CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Methods for presentation of results
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 53 -
The first method, as used in Cranfield I, involves obtainihg total
figures of the numbers of documents involved for the whole set of
questions being used in the test, and then converting the one grand
total into, say, recall and precision ratios. In the case of the 35
question set, a total of 287 relevant documents is sought; at a
coordination level of 3+, 157 of the relevant documents are retrieved,
together with 2,865 non-relevant documents. These totals are then
used to calculate the ratios of:-
100a 157
lq ecall = =
a + c ½87 x 100 = 54.7%
100a 157
Precision =
x 100 = 5.2%
a + b 157 + 2865
100b 2865
Fallout = b + d (35 x 1400) - 287 x 100 = 5.9%
These ratios are obtained for all of the seven possible coordination levels,
and can then be plotted as points on a graph. While this procedure of
averaging the numbers was used for presenting the results of the first
Aslib-Cranfield Project and the Western Reserve University test, at
the time of the latter test it was realised that this method results in
certain searches affecting the final figures more than others. Non-
typical questions, such as those which retrieve an exceptionally large
number of non-relevant documents, will exert a disproportionate influence
on the final figures, and, in the W.R.U. test, separate figures were given
showing the change in performance when those questions that retrieved
unusually large numbers of (mainly) non-relevant documents were deleted
(Ref. 2, page 13).
The second method of merging a set of results first converts the
results of individual questions into recall, precision or fallout ratios and
then obtains the final figures by using the average of the ratios of each
question. In Fig. 3.18T are given the results of 35 questions which have
been calculated in both ways, thus enabling a comparison of the 'average
of numbers' and 'average of ratios' methods for these particular results.
Recall, fallout and precision ratios for the two methods are compared in
tabular form. It can be seen that there is no significant difference in the
recall ratios between the two methods; at some coordination levels the
average of ratios gives a slightly higher recall ratio, and at other levels
the opposite is the case. The fallout values also show no significant
difference. However, in the case of the precision ratios, it is clearly
seen that the average of ratios gives a substantially higher figure for all
coordination levels. Fig. 3.19P is a recall/precision plot of the two methods,
where the 'better' curve results from averaging the ratios. As can be
seen from the tables, a recall/fallout plot would have virtually overlapping