CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Simulated ranking and document output cut-off chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 202 - comparison could be made between different index languages by measuring the performance over the whole curve, and the polar coordinate graphs were first tried with the p[OCRerr];rformance curves obtained by the conventional coordination level cut-off as given in Chapter 4, where there was no direct relationship between the various eut-offs. The intention was to calculate the area encompassed by the performance curve within certain limits; with Fig. 5.6P {which is sim,ilar to Fig. 4203H it was calculated that, in the area bounded by 95% recall and 85% precision, Index Language I.l.a had an area measure of 24.9 while Index Language 1.6.a had an area measure of 21.1. It seemed to be unnecessary to do this with these new plor[OCRerr], since the document output cut-off automatically gave an exact match bc:ween systems. It was therefore hypothesised that obtaining a normalised recall ratio for all the systems tested would permit an 'order of effectiveness' to be determined. To obtain this normalised recall ratio, the recall ratio at each of the seventeen document cut-off levels would be summed and then divided by seventeen. It was possible to test this idea by using the output from the SMART searches on the same collection. As previously stated, Professor Salton had results for fourteen different options, and Fig. 5.1T shows the output for question 147. Having similar output sheets for all 42 questions, it was possible to prepare a score sheet for each option. As an example the score sheet for 'Cran. Con Con Index News QS' is shown in Fig. 5.7T. Reference to Fig. 5.1T will show that the five relevant documents for Question 147 were ranked at 6, 7, 103, 122 and 138, and it can be seen that this is shown in the appropriate columns of Fig. 5.7T. The recall and precision ratios based on this procedure were obtained for the fourteen SMART options and the results are shown in Fig. 5.8T. The normalised recall ratios for each option were then caleuiated and are shown in F[OCRerr]g. 5.9T. A norma lised recall and normalised precision for each question is given in the output sheets of the SMART searches (see Fig. 6.11 and finally calculated for the complete set of questions; the figures so obtained are aIso given in F ig. 5.gT. In Fig. 5.10T these two sets of results are arranged in order of effectiveness the higher figures representing the better results. It will be seen that, with very minor variations," the order obtained by the Cranfietd normalised recall is the same as that obtained with the SMART normalised recall, with a rank correlation of +. 991. This would appear to validate the ranking groups used at Cranfield, and also the simple method we have used to obtain the normalised recall ratio. To sum up what has been so far discussed,the document ranking method has two major advantages. 1. It enables a series of cut-oils tn be applied with equal consistency 10O la + 1[OCRerr]} ) between tests of different (i.e. an equal cut-off ratio, r[OCRerr] systems using the same document/question sets, and thus solves the problem of totalling sets of results which was discussed in Chapter 3. 2. It enables a series of recall ratios to he obtained which are directIy comparable, and permits the calculation of a single measure of performance, normalised recall. Regarding the measure itself, it was conceived {in a slightly different form) and originally used by Professor Salton. It is a method of representing performance over the whole of the operational range and there- fore differs fundamentally from the ,single-point composite measures' which were discussed in Chapter 3. In experimental work of the nature described