CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Simulated ranking and document output cut-off
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 197 -
level higher than three. It will be seen from Fig. 5.2. that at the single
term level, only three of these documents have been found. The remaining
relevant document can })nly be .retrieved by searching through the remainder
of the collection, namely 105 documents and therefore at c=0, x is
taken to be 105. In addition the equations do not always produce°whole
numbers, so Cl[OCRerr] has to be taken to the nearest whole number, or to the
lower whole numnber where the value falls exactly between two whole
numbers (since Q123 is an odd-numbered question).
Question 123
At level c=3, then x3 = 6, Y3 = 3, X3 = 0, Y3 = 0
At level c=2, then x2 = 21, Y2 = 0, X2 = 6, Y2 = 3
At level c=l, then x1 = 68. Yl = 0, X1 = 27, Y1 = 3
At level c=0, then x = 105 Yo = 1, X = 95 Y = 3
O ' 0 ' 0
Then :-
3R1
3R1
3R3
(.6 + 1[OCRerr] 7
= 0 + (1 - 0) [OCRerr]T--+ l/ = ¥ = 2
7
=-- -- 3
2
f,6 + It 21
= 0 + (3 - 0)\3----[OCRerr]lj = [OCRerr] = 5
°R4 = 95 + (4 - 3) (I05 "[OCRerr] 1[OCRerr]
1 + 95 + 53 = 148
The argument for th[OCRerr]s simulated ranking method is given in
Appendix 5A.
When all such rankings have been calculated for the searches with a
single index language, the resuhs are entered on a score sheet as in
Fig. 5.3T,which represents the results as given in Fig. 5.2T. Seventeen
ranking groups were selected to have approximately the same number of
documents falling in to each group; these were 1; 2; 3; 4; 5; 6-7; 8-10;
11-15; 16-20; 21-30; 31-50; 51-75; 76-100; 101-125; 126-150; 151-175; and
176-200. A cross is put in the appropriate column of the score sheet for
every relevant document for the 42 questions. From the score-sheet, the
total number of relevant documem_¢ retrieved at each of the seventeen cut-
off levels can now be obtained, in Fig. 5.3T it is shown that, in the 42
searches, the first document retrieved was relevant on 23 occasions. As
there were 198 documents relevant to the 42 questions, the recall ratio at
this stage can be "calculated as ,[OCRerr]938 x 100 = 12%; the precision ratio is
calculated on the basis of one document having been retrieved for each
q::::idndo:unmd£ t[OCRerr]l:tr[OCRerr].f°[OCRerr]d 4[OCRerr]a: l:/evan[OCRerr]?%ma:inn:la tfot:heofe::Ch:l:v:[OCRerr]hte
documents so far retrieved, so the recall ratio increases to 22%. The
precision ratio is now calculated on the basis of 2 x 42 documents having
been retrieved, and is therefore 5]%. Recall and precision ratios are
similarly calculated for each document output cut-off level; ultimately the
recall ratio will reach 100%.