CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Simulated ranking and document output cut-off chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 215 - .normalised recall ratio is shown for each index language hy Search E and Search A. It wil1 be seen that there is an improvement with each language of from 1 to 2 points. Fig. 5.22T shows the ranking score sheet for Index Language I.l.a. with the 42 questions on the 200 document collection, but with the lowest level of exhaustivity of indexing. Fig. 5.23P compares these results with those obtained under similar conditions except that exhaustivity was at its highest level {as Fig. 5.3T). Four grades of document relevance were used in the tests, and the effect on performance of each of the relevance grades has been considered in Section 6 of Chapter 4. An alternative method of scoring performance from that so far used would be to take account of these relevance gradings by giving each document a weighting related to its relevance grading. The use of the document output cut-off method and normalised recall permits this to be done in what might be considered to be a meaningful manner. A simple form of weighting is to give a score of 4 to those documents rated relevance 1, a score of 3 for documents of relevance 2, a score of 2 for documents of relevance 3 and a score of 1 for documents rated relevance 4. The effect of this would be that question 119, for instance, which has two documents {1378 and 1667) rated relevance 2 and four documents (1324,1666, 1670 and 2391) rated relevance 3 would now have a total "retrieval score's of (2 x 3) + (4 x 2) = 14. Referring back to Fig. 5.3T, the score sheet for this question would be amended to show the weighting of each relevant document according to the order in which the documents of the two levels of relevance were retrieved. This was done for the 42 questions by Index Language I.l.a and the amended score sheet is given as Fig. 5.24T. The recall ratio is now determined on the total "points" score for the set of questions, which is 421. At a document cut-off of 1, the recall ratio is therefore shown to be [OCRerr] = 14% and the recall ratios are similarly calculated for the other sixteen cut-off groups. The normalised recall ratio is then calculated aa being 67.12. This procedure was repeated for five other index languages to find whether the effect of a weighting score made any difference to their comparative performance. As can be seen from Fig. 5.25T, there was for each case'an increase of approximately two points in the normalised recall, so it does not appear that this method' of weighting makes any significant difference to the overall comparison. The exercise was repeated using different weightings, with a score of 10 for documents rated relevance 1, a score of 5 for documents rated relevance 2, a score of 3 for documents rated relevance 3 and a score of 1 for documents rated relevance 4. This resulted in a further small increase in the normalised recall ratios, but made no significant difference in the comparison between systems. It would be incorrect to state that some form of weighting might not be useful in certain circumstances, but it would seem that it does not have any particular value in this test. In connection with the normalised recall ratio, it is obvious that there is what could be considered a minimum figure which is based on the random retrieval of the whole collection for[OCRerr] every question. For instance, the three relevant documents'of Question 79 would, with random retrieval, be ranked 50, 100 and 150, while the seven relevant documents of Question 190 would be ranked 25, 50, 75, 100, 125, 150 and 175. With this particular document]question set, the normalised recall ratio based on this random