CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Methods for presentation of results
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 55 -
curves.
In the tests at Cranfield and in other tests where sufficient data
has been available, the samples which have been processed by both
methods have always shown this increase in precision with recall
remaining much the same. However, we do not wish to be misquoted
on this point and would emphasize that while it is probably true that
the average of ratios will usually give a better performance figure,
it would be wrong to assume that the proportional improvement would
always be so pronounced as in the example shown.
An evaluation of the two methods which shows one method to
be superior is not possible, since proponents of both methods can
give good reasons for adopting one method in preference to the other.
The theoretical cause of the discrepancy is the variation in the base
from question to question: in the case of the recall ratio it is the
number of relevant documents sought; in the precision ratio it is
the total retrieved; and in the fallout ratio it is the total non-relevant.
The average of numbers method weights the results of individual questions
according to the base, and a larger base exerts a greater influence on
the final result. The average of ratios completely ignores the base
variation. In situations outside retrieval tests, where similar data has
to be averaged, it is frequently advocated that the variation in base should
be allowed for, and the average of numbers used (see, for instance 13ef.
12, page 161). The difference in the results of the two methods is small
except when the range and distribution of the variation in base becomes
large, as is often the case with the precision ratio. However, both
methods appear to be equally reasonable for use in retrieval situations,
and the different results are really complementary viewpoints requiring
careful interpretation.
A description of the different viewpoints represented by the two
methods has been given by Salton (Ref. 13). He suggests that the average
of ratios is 'a query-oriented viewpoint', and the average of numbers is
a 'document-oriented viewpoint'; performance figures using the average
of ratios indicate the performance of a single typical search question,
typical that is of the set of questions used in the test. The use of average
of numbers indicates the result of the whole set of questions, or indicates
the success in performance of looking for a given set of relevant documents
(287 in the example being used). This really ignores the actual individual
questions involved, since one question with 287 relevant documents could
in theory have the same result as 35 questions having in total 287 relevant
documents. Thus the average of numbers gives an arithmetical mean
value for a set of questions, and .the average of ratios gives what
approximates to a 'median' value which reflects the performance of a
typical question.
Neither method appears to have any marked superiority over the
other as a means of presenting results. However, the decision to use
in this volume the average of numbers method was based on a most