CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Methods for presentation of results chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 49 - of a, b and c, from the retrieval table. He suggested that the measure should reflect the ability of the system to maximise a relative to b and c, described as the selectivity of the system. The proposed measure F, uses a normalisation factor S, where S = a. + b + c, and 100 a_ S F = b c [OCRerr]+ g+ 1 F varies from 0 to 100, and is plotted on a recall/precision plot in Fig. 3.15P. The curves are symmetrical about the diagonal from the bottom left corner to the top right corner, and alter in shape as they approach the top right side. All the composite measures described have an apparently reasonable scale of values ranging from the case of worst performance to that of best possible performance, but none of these measures can show the very large differences that occur between these two points, in the different positions at which systems actually operate. The curves in Figs. 3.4P and 3.5P are indicators of retrieval performance when a component of a system is varied to give results over the largest possible operating range, but the composite measures can only reflect one, or sometimes two, points of such curves. It is unfortunate that, in examples investigated so far, the point on the curves which determines the highest value assigned to that test by a given composite measure is usually either the point of maximum recall, or of maximum precision, neither of which may be the best points to use. It is a reasonable conclusion that for experimental tests where changes of the variables in systems are examined, the composite measures so far proposed are inadequate, although for tests where a single cut-off point is chosen, or a single cut-off is applied to two systems in a comparable manner, some of the composite measures may be useful. In experimental tests it is suggested that an 'area measure' is required; a possible solution is put forward in Chapter 5, Having examined the main suggested performance measures, it may be asked whether any theoretical objective methods are known which could be used to evaluate the proposed measures, or whether tests and experience of actual results will be the only arbiter. The only theoretical basis suggested so far is the use of the 2 x 2 contingency table, as already mentioned. Although the retrieval situation obviously fits the case in the sense that the resulting values of a retrieval test perfectly fit the nine categories in the table, no reasons have been advanced to show that figures from retrieval tests can benefit from the statistical tests commonly used. The retrieval situation is very different from the simple statistical one. For example, a typical 2 x 2 table taken from a popular textbook on statistics by M.J. Moroney (Ref. 11, page 264)