CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Simulated ranking and document output cut-off chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 197 - level higher than three. It will be seen from Fig. 5.2. that at the single term level, only three of these documents have been found. The remaining relevant document can })nly be .retrieved by searching through the remainder of the collection, namely 105 documents and therefore at c=0, x is taken to be 105. In addition the equations do not always produce°whole numbers, so Cl[OCRerr] has to be taken to the nearest whole number, or to the lower whole numnber where the value falls exactly between two whole numbers (since Q123 is an odd-numbered question). Question 123 At level c=3, then x3 = 6, Y3 = 3, X3 = 0, Y3 = 0 At level c=2, then x2 = 21, Y2 = 0, X2 = 6, Y2 = 3 At level c=l, then x1 = 68. Yl = 0, X1 = 27, Y1 = 3 At level c=0, then x = 105 Yo = 1, X = 95 Y = 3 O ' 0 ' 0 Then :- 3R1 3R1 3R3 (.6 + 1[OCRerr] 7 = 0 + (1 - 0) [OCRerr]T--+ l/ = ¥ = 2 7 =-- -- 3 2 f,6 + It 21 = 0 + (3 - 0)\3----[OCRerr]lj = [OCRerr] = 5 °R4 = 95 + (4 - 3) (I05 "[OCRerr] 1[OCRerr] 1 + 95 + 53 = 148 The argument for th[OCRerr]s simulated ranking method is given in Appendix 5A. When all such rankings have been calculated for the searches with a single index language, the resuhs are entered on a score sheet as in Fig. 5.3T,which represents the results as given in Fig. 5.2T. Seventeen ranking groups were selected to have approximately the same number of documents falling in to each group; these were 1; 2; 3; 4; 5; 6-7; 8-10; 11-15; 16-20; 21-30; 31-50; 51-75; 76-100; 101-125; 126-150; 151-175; and 176-200. A cross is put in the appropriate column of the score sheet for every relevant document for the 42 questions. From the score-sheet, the total number of relevant documem_¢ retrieved at each of the seventeen cut- off levels can now be obtained, in Fig. 5.3T it is shown that, in the 42 searches, the first document retrieved was relevant on 23 occasions. As there were 198 documents relevant to the 42 questions, the recall ratio at this stage can be "calculated as ,[OCRerr]938 x 100 = 12%; the precision ratio is calculated on the basis of one document having been retrieved for each q::::idndo:unmd£ t[OCRerr]l:tr[OCRerr].f°[OCRerr]d 4[OCRerr]a: l:/evan[OCRerr]?%ma:inn:la tfot:heofe::Ch:l:v:[OCRerr]hte documents so far retrieved, so the recall ratio increases to 22%. The precision ratio is now calculated on the basis of 2 x 42 documents having been retrieved, and is therefore 5]%. Recall and precision ratios are similarly calculated for each document output cut-off level; ultimately the recall ratio will reach 100%.