CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Methods for presentation of results chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 53 - The first method, as used in Cranfield I, involves obtainihg total figures of the numbers of documents involved for the whole set of questions being used in the test, and then converting the one grand total into, say, recall and precision ratios. In the case of the 35 question set, a total of 287 relevant documents is sought; at a coordination level of 3+, 157 of the relevant documents are retrieved, together with 2,865 non-relevant documents. These totals are then used to calculate the ratios of:- 100a 157 lq ecall = = a + c ½87 x 100 = 54.7% 100a 157 Precision = x 100 = 5.2% a + b 157 + 2865 100b 2865 Fallout = b + d (35 x 1400) - 287 x 100 = 5.9% These ratios are obtained for all of the seven possible coordination levels, and can then be plotted as points on a graph. While this procedure of averaging the numbers was used for presenting the results of the first Aslib-Cranfield Project and the Western Reserve University test, at the time of the latter test it was realised that this method results in certain searches affecting the final figures more than others. Non- typical questions, such as those which retrieve an exceptionally large number of non-relevant documents, will exert a disproportionate influence on the final figures, and, in the W.R.U. test, separate figures were given showing the change in performance when those questions that retrieved unusually large numbers of (mainly) non-relevant documents were deleted (Ref. 2, page 13). The second method of merging a set of results first converts the results of individual questions into recall, precision or fallout ratios and then obtains the final figures by using the average of the ratios of each question. In Fig. 3.18T are given the results of 35 questions which have been calculated in both ways, thus enabling a comparison of the 'average of numbers' and 'average of ratios' methods for these particular results. Recall, fallout and precision ratios for the two methods are compared in tabular form. It can be seen that there is no significant difference in the recall ratios between the two methods; at some coordination levels the average of ratios gives a slightly higher recall ratio, and at other levels the opposite is the case. The fallout values also show no significant difference. However, in the case of the precision ratios, it is clearly seen that the average of ratios gives a substantially higher figure for all coordination levels. Fig. 3.19P is a recall/precision plot of the two methods, where the 'better' curve results from averaging the ratios. As can be seen from the tables, a recall/fallout plot would have virtually overlapping