Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2

CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Simulated ranking and document output cut-off chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 220 - Index Normalised Normalised Language Recall Recall Ratio Ratio (basic) (weighted) I.l .a 65.00 67.12 1.7 .a 64.05 65.94 III. 1 .a 61.76 63.64 III. 6. a 59.17 61.06 !I. 9. a 57 .i i 58.94 II. 5.a 55.05 57.11 FIGURE 5.25T COMPARISON OF NORMALISED RECALL RATIOS BY BASIC SCORING METHOD {as Fig. 5.15T) AND BY WEIGHTED SCORING METHOD FOR SIX INDEX LANGUAGES. retrieval would .be 26%. On the other hand, as was discussed earlier in this chapter, the theoretical maximum performance cannot be achieved due to the different numbers of relevdnt documents for each question, so the highest possible normalfised recall ratio would be 86.70%. It should also be emphasised that the normalised recall ratio only has meaning within the context of the manner in which it has been calculated. In this particular case it was by averaging the results of seventeen cut-off groups as given on page 198. Assume that the number of groups had been reduced to thirteen by combining the first six groups into two larger groups covering documents ranked 51 - 100 and documents ranked 101 - 200. The effect of doing this would be to reduce the ncrmalised recall ratio for index language I.l.a from 65% to 55.7%. On the other hand, if the original groups were broken down so that no groups had more than ten rankings, the normalised recall ratio based upon the resul[OCRerr]Lng twenty-seven groups would be 75.1%. At the same time, the effect of either of these actions would be to change, as considered in the previous paragraph, the minimum figure based on random retrieval and the maximum possible figure.