CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Methods for presentation of results
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 39 -
A comparison of the recall ratio with fallout ratio can be made in
the same way. We are not aware of any previous occasions when the
fallout ratio has been used for presenting test results, although Swets
(Ref. 4) has discussed its possible use. In that it measures the ratio
of the non-relevant retrieved to the total non-relevant in the collection
b
b + d ' it is very sensitive to N, the total number of documents in the
collection. While it might not be found to be particularly satisfactory
for tests on operational systems, it has an attraction in experimental
testing where collections of different but known size are being tested,
since it automatically compensates for the changes in size. Fig. 3.5T
takes the figures of Fig. 3.3T and Fig. 3.4T and replaces the precision
ratio by fallout ratio. A characteristic of fallout ratios is that they tend
to be concentrated at low numbers; for this reason the figures are taken
to three places of decimals and the resultant plot of recall ratio against
fallout ratio is clearer if made on a semi-log scale, as in Fig. 3.5P.
In this case the better performance is obtained when the curve is nearer
the top left hand corner, whereas the recall precision curve is optimised
towards the top right hand corner. Therefore, as in Fig. 3.4P, search
Y is shown to give a generally improved performance over search X.
Either of these twin measures is satisfactory for presenting the
performance of systems where the generality number is held constant,
although the argument has been advanced that a plot of recall/precision
is not valid since both ratios contain a (relevant retrieved). It has been
incorrectly argued that in plotting a against a all the a's
a+c a+b '
cancel out, with the result that the factors being plotted are c against b.
Fairthorne (Ref. 5) has said that a more reliable precision ratio is
given by what he calls the 'distillation ratio' which is a c
a+b d
C .
However, he agrees that when the correction factor of -- is negligible
o
compared with the precision ratio, the latter is a valid measure. In fact,
in the results presented in Fig. 3.3T, the correction factor at the
coordination level of five terms is 0. 0038, which can definitely be
considered negligible.
Rees (Ref. 6) argues against precision ratio in favour of a measure
d
that is complementary to fallout, namely b + d ' on the grounds that it
takes into account one of the vital parameters in a retrieval system - size
of file. To some extent this is true, but it is a matter which has to be
approached very carefully. The difficulty lies in determining exactly what
is the correct value of N, that is to say how many documents can validly
be considered to form the total collection in regard to any question. This
matter is considered in more detail later in this chapter. It is true that
the same difficulty arises in calculating the generality number, but if N
is known, then it is just as easy to calculate the generality number as to