CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Methods for presentation of results chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. -4[OCRerr] - Other composite measures proposed ca[OCRerr]l be described as non-linear composite measures, since their scale of values varies in a non-linear fashion when recall, precision, or fallout are varied, and the display of their values on the twin variable plots results in curves rather than straight lines. When a measure of this type includes d (non-relevant not retrieved) in its equation, the values and curves of the measure will be affected by the generality number. For Figs. 3.11 to 3.15 a generality of 5.0 is used in drawing the curves for the measures involved, since the performance results of searches X and Y that are plotted were obtained in a situation of that generality. The values of a composite measure of this type have been calculated in a manner similar to that adopted in making the two combined plots of recall, precision and fallout, Figs. 3.7P and 3.8P. In this case various sets of recall and fallout ratios, and also recall and precision ratios (at a generality of 5.0) were selected in advance and the resulting value of the composite measure calculated. This was done for different ratios to obtain curves of the measure that give a general indication of the range in its values. The first of these non-linear composite measures which we consider is that proposed by J. Verhoeff and others, which is described as a 'Measure of Merit' (Ref. 9), with the basic equation: M =a -b - c+d This can also be written as M = (a + d) - (b + c) which is really the sum of the 'successes' minus the sum of the 'failures'. The values are shown in the two twin variable plots, Figs. 3.11P and 3.12P, with the equations divided by 'N' to obtain a range of values between 0 and 1, and it can be seen how high values of the measure occur at high recall with high precision or, to say the same thing in a different way, high recall with low fallout. The measure was intended to be used with various weights associated with the four component values, and any of the composite measures being described could incorporate this if in a given situation a meaningful set of weights can be devised. One might, for instance, hypothesise 'cost values' of failing to retrieve a relevant document or retrieving a non-relevant document. Any such weighting would alter the position of the measure's curves on the plots. A more complex version of this is the Q factor, which has been suggested by Farradane as suitable for use in retrieval tests. This is a statistical coefficient of association proposed by Yule (Eel.10). The formula ad - bc which can be described as the product of the successes minus is Q - ad + bc ' the product of the failures divided by the sum of the same two products. Figs. 3.13P and 3.14P show the two graphs with Q curves plotted, with the performance curves. It has not been shown that Q curves have any significance in retrieval tests, and there does not appear to :be any reason why they should. A measure put forward in discussion by Vickery at the NATO Advanced Study Institute on Evaluation, held at The Hague, July 1965, uses the values