CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Conclusions chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 262 - While this particular result has obviously been prepared to illustrate the point, it would seem that this is an example of what has been consistently happening in the test searches with the Single Te/'m and Controlled Term index languages. Whereas the broadening of the term classes has increased the recall of relevant documents at higher coordination levels, the effect of -doing this has been more than offset by the increased number of non-relevant documents. Only when the index terms being used are too precise, as in the case of the Simple Concept Natural Language, can the formation of broad classes of terms bring about an improvement. Finally, it is necessary to consider the measures which have been used in this test, and to ask whether it is possible that some other measures would have brought about a change in the comparative results. Obviously suspect is the normalised recall ratio, based on a simulated rank output. While at first it might seem that such a measure is likely to weigh in favour of systems having high recall ratios, it is in fact mainly influenced by the first two ranked documents. At this stage, the recall ratios, as can be seen from Figures 5.11T - 5.14T, are as follows Recall Ratio at Document Output Cut-off of 2 Index Language 23% 1.2, 1.3 22% 1.1 21% 1.6, 1.7 20% 19% 1.5, 1.8, II.9, III. 2 18% II.3, II.12, IV.3, IV.4 17% II.10, III. 1, IV.l, IV.2 16% 1.9, II.11, III.3, III.4 15%' II. 5 14% II. 13 13% II.2, II.8, III. 5 12% II.1, II.4, II.6, III.6 11% II.7, II.15 10% 9% II. 14 It will be seen that with the exception of Index Language II.3, which(at 18%) rises from 28 to 10=, there is a strong correlation between this ordering and the final ordering as given in Table 8.1. With the document output cut-off method, recall and precision are, as we explained earlier, completely interdependent, and therefore it would appear to be a measure that is quite impartial as between recall and precision, it is known that others are investigating different measures, and most of those that have been proposed have already been considered in Chapter 3. Now that the results of this test are available, it is to be hoped that proponents of" new measures will be able to demonstrate any superiority over those used in this report. Until such time, there appears to be no reason to suggest that the measures have affected the comparative results. With the possible doubtful exception of the subject field, there appears to be nothing in the test environment which could be held responsible for serious distortion of the results as between one system and another. There- fore it is necessary to proceed on the assumption that the results are