CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Simulated ranking and document output cut-off
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 202 -
comparison could be made between different index languages by measuring
the performance over the whole curve, and the polar coordinate graphs were
first tried with the p[OCRerr];rformance curves obtained by the conventional
coordination level cut-off as given in Chapter 4, where there was no direct
relationship between the various eut-offs. The intention was to calculate
the area encompassed by the performance curve within certain limits;
with Fig. 5.6P {which is sim,ilar to Fig. 4203H it was calculated that, in
the area bounded by 95% recall and 85% precision, Index Language I.l.a
had an area measure of 24.9 while Index Language 1.6.a had an area
measure of 21.1. It seemed to be unnecessary to do this with these new
plor[OCRerr], since the document output cut-off automatically gave an exact match
bc:ween systems. It was therefore hypothesised that obtaining a normalised
recall ratio for all the systems tested would permit an 'order of effectiveness'
to be determined. To obtain this normalised recall ratio, the recall ratio
at each of the seventeen document cut-off levels would be summed and then
divided by seventeen.
It was possible to test this idea by using the output from the SMART
searches on the same collection. As previously stated, Professor Salton
had results for fourteen different options, and Fig. 5.1T shows the output for
question 147. Having similar output sheets for all 42 questions, it was
possible to prepare a score sheet for each option. As an example the score
sheet for 'Cran. Con Con Index News QS' is shown in Fig. 5.7T. Reference
to Fig. 5.1T will show that the five relevant documents for Question 147 were
ranked at 6, 7, 103, 122 and 138, and it can be seen that this is shown in
the appropriate columns of Fig. 5.7T. The recall and precision ratios
based on this procedure were obtained for the fourteen SMART options and
the results are shown in Fig. 5.8T. The normalised recall ratios for each
option were then caleuiated and are shown in F[OCRerr]g. 5.9T. A norma lised
recall and normalised precision for each question is given in the output
sheets of the SMART searches (see Fig. 6.11 and finally calculated for the
complete set of questions; the figures so obtained are aIso given in F ig. 5.gT.
In Fig. 5.10T these two sets of results are arranged in order of effectiveness
the higher figures representing the better results. It will be seen that,
with very minor variations," the order obtained by the Cranfietd normalised
recall is the same as that obtained with the SMART normalised recall,
with a rank correlation of +. 991. This would appear to validate the
ranking groups used at Cranfield, and also the simple method we have used
to obtain the normalised recall ratio.
To sum up what has been so far discussed,the document ranking
method has two major advantages.
1. It enables a series of cut-oils tn be applied with equal consistency
10O la + 1[OCRerr]} ) between tests of different
(i.e. an equal cut-off ratio, r[OCRerr]
systems using the same document/question sets, and thus solves the
problem of totalling sets of results which was discussed in Chapter 3.
2. It enables a series of recall ratios to he obtained which are directIy
comparable, and permits the calculation of a single measure of performance,
normalised recall.
Regarding the measure itself, it was conceived {in a slightly different
form) and originally used by Professor Salton. It is a method of
representing performance over the whole of the operational range and there-
fore differs fundamentally from the ,single-point composite measures' which
were discussed in Chapter 3. In experimental work of the nature described