CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Methods for presentation of results
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 49 -
of a, b and c, from the retrieval table. He suggested that the measure
should reflect the ability of the system to maximise a relative to b
and c, described as the selectivity of the system. The proposed measure
F, uses a normalisation factor S, where S = a. + b + c, and
100 a_
S
F =
b c
[OCRerr]+ g+ 1
F varies from 0 to 100, and is plotted on a recall/precision plot in
Fig. 3.15P. The curves are symmetrical about the diagonal from the
bottom left corner to the top right corner, and alter in shape as they
approach the top right side.
All the composite measures described have an apparently reasonable
scale of values ranging from the case of worst performance to that of best
possible performance, but none of these measures can show the very large
differences that occur between these two points, in the different positions
at which systems actually operate. The curves in Figs. 3.4P and 3.5P
are indicators of retrieval performance when a component of a system is
varied to give results over the largest possible operating range, but the
composite measures can only reflect one, or sometimes two, points of
such curves. It is unfortunate that, in examples investigated so far,
the point on the curves which determines the highest value assigned to
that test by a given composite measure is usually either the point of
maximum recall, or of maximum precision, neither of which may be the
best points to use. It is a reasonable conclusion that for experimental
tests where changes of the variables in systems are examined, the
composite measures so far proposed are inadequate, although for tests
where a single cut-off point is chosen, or a single cut-off is applied to
two systems in a comparable manner, some of the composite measures may
be useful. In experimental tests it is suggested that an 'area measure' is
required; a possible solution is put forward in Chapter 5,
Having examined the main suggested performance measures, it may be
asked whether any theoretical objective methods are known which could be
used to evaluate the proposed measures, or whether tests and experience
of actual results will be the only arbiter.
The only theoretical basis suggested so far is the use of the 2 x 2
contingency table, as already mentioned. Although the retrieval situation
obviously fits the case in the sense that the resulting values of a retrieval
test perfectly fit the nine categories in the table, no reasons have been
advanced to show that figures from retrieval tests can benefit from the
statistical tests commonly used. The retrieval situation is very different
from the simple statistical one. For example, a typical 2 x 2 table taken
from a popular textbook on statistics by M.J. Moroney (Ref. 11, page 264)