CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Methods for presentation of results
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
-4[OCRerr] -
Other composite measures proposed ca[OCRerr]l be described as non-linear
composite measures, since their scale of values varies in a non-linear
fashion when recall, precision, or fallout are varied, and the display of
their values on the twin variable plots results in curves rather than
straight lines. When a measure of this type includes d (non-relevant not
retrieved) in its equation, the values and curves of the measure will
be affected by the generality number. For Figs. 3.11 to 3.15 a
generality of 5.0 is used in drawing the curves for the measures involved,
since the performance results of searches X and Y that are plotted were
obtained in a situation of that generality. The values of a composite
measure of this type have been calculated in a manner similar to that
adopted in making the two combined plots of recall, precision and fallout,
Figs. 3.7P and 3.8P. In this case various sets of recall and fallout
ratios, and also recall and precision ratios (at a generality of 5.0) were
selected in advance and the resulting value of the composite measure
calculated. This was done for different ratios to obtain curves of the
measure that give a general indication of the range in its values.
The first of these non-linear composite measures which we consider
is that proposed by J. Verhoeff and others, which is described as a
'Measure of Merit' (Ref. 9), with the basic equation:
M =a -b - c+d
This can also be written as M = (a + d) - (b + c) which is really the sum
of the 'successes' minus the sum of the 'failures'. The values are shown
in the two twin variable plots, Figs. 3.11P and 3.12P, with the equations
divided by 'N' to obtain a range of values between 0 and 1, and it can be
seen how high values of the measure occur at high recall with high precision
or, to say the same thing in a different way, high recall with low fallout.
The measure was intended to be used with various weights associated with
the four component values, and any of the composite measures being described
could incorporate this if in a given situation a meaningful set of weights can
be devised. One might, for instance, hypothesise 'cost values' of failing
to retrieve a relevant document or retrieving a non-relevant document. Any
such weighting would alter the position of the measure's curves on the plots.
A more complex version of this is the Q factor, which has been
suggested by Farradane as suitable for use in retrieval tests. This is a
statistical coefficient of association proposed by Yule (Eel.10). The formula
ad - bc which can be described as the product of the successes minus
is Q - ad + bc '
the product of the failures divided by the sum of the same two products. Figs.
3.13P and 3.14P show the two graphs with Q curves plotted, with the
performance curves. It has not been shown that Q curves have any significance
in retrieval tests, and there does not appear to :be any reason why they
should.
A measure put forward in discussion by Vickery at the NATO Advanced
Study Institute on Evaluation, held at The Hague, July 1965, uses the values