CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Supplementary tests and results chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 225 - Basic and supplementary questions Figs. 6.4T and 6.5T present the results on Index Language I.l.a for the 221 questions when these are divided into the 94 basic questions and 127 supplementary questions (see Vol. 1, Appendix 3G). The basic questions have a generally superior performance, particularly in the middle range of coordination levels and this can be partly accounted for by the higher generality number for this group. On the other hand, documents relevant to the suppiementary questions have an average relevance grading that is h[OCRerr]gher than that for the basic questions (2.7 as against 3.0), and this would have been expected to more than counter the previous effect. It might be suspected that the difference in performance is due to a stronger artificial match between the basic questions and, say, the document titles than exists with the supplementary questions. While analysis does not bear this out, no other adequate explanation can be offered, and the matter is con.sidered again in Chapter 8. Average of ratios On pages 51 to 56, the matter of averaging sets of results was considered, the discussion being on the question of using the average of ratios or the average of numbers. To go into this in more detail, the subset of 35 seven-starting-term questions with Index Language I.l.a on the 1400 document collection is used to demonstrate some difficulties that arise with the average of ratios. Numerical results for the 35 questions can be found in Appendix 4A and the results are presented (by the average of numbers) in Fig. 4.110T. It can be seen from Fig. 6.6T that, when ratios are obtained for each individual question, three different situations arise. Firstly, there are those questions (e.g. Q82} where it is possible to include recall and precision ratios at all coordination levels to the maximum of 7 (since these are all seven-starting-term questions). Secondly, there are those questions (e.g. Q294) where no documents are retrieved at the higher coordination levels, so no ratios can be included. Thirdly, there are those questions {e.g. Q40) where at the higher coordination levels no relevant documents are retrieved although some non-relevant documents are retrieved. This latter situation is indicated in Fig. 6.6T by an asterisk in the appropriate column. Because of these three different situations, it is a matter for argument as to the figure which should be used for obtaining the average ratios. A[OCRerr] an example, at the coordination level of four, the sum of the precision ratios is 561.7. In order to obtain the average precision ratio for the whole set of questions, this figure could be divided by [OCRerr]5, this representing the total number of questions. Alternatively it could be divided by 28, representing the questions which, at this particular coordination level, retrieved some documents, either relevant or non-relevant. Finally it could be divided by 23, representing the number of questions which, at this particular coordination level, retrieved relevant documents. With the results by the average of numbers for comparison, the precision ratios obtained by these three methods are given in Fig. 6.7T. The first method is clearly unsatisfactory; it would appear to be relatively immaterial as to whether method 2 or 3 should be used, but it is obviously important that when results are presented by the average of ratios, it should be made quite clear as to which procedure has been adopted. The complexity involved in presenting results by the average of ratios is an additional reason why, in this report, we have preferred to