CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Simulated ranking and document output cut-off chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 205 - in this report, it appears to give a valid single measure for comparing the performance of different systems, and, without wishing to be overdogmatic,appears more suitable for this purpose than anythin[OCRerr] else that has been proposed. Having - to our satisfaction - established the reasonableness both of the simulated ranking method and also the method for obtaining normalised recall, the procedure was used for %he four main groups of index languages. Fig. 5. liT gives the recall and precision ratios for the eight single term languages, while Fig. 5.12T gives similar figures for the fifteen concept languages. The results of the six controlled languages are given in Fig. 5.13T and the searches of titles and abstracts are shown in Fig. 5.14T. These tables also show the normalised recall ratio for each index language. In Fig. 5.15T the index languages are rearranged into an order based on this normalised recall ratio, from which it can be seen that the highest score (65.82) is obtained by Index Language I. 3.a (single terms, word forms), with the lowest score (44.64) for Index Language II. 1.a (single concepts, natural language). It will be noted that this table also includes the fourteen SMART options. The figures given so far have been based on what has earlier been described as the average of numbers, and it might be thought that the document ranking method would be particularly susceptible to aberrations which the average of numbers sometimes produces. The results have therefore been recalculated by the average of ratios. To do this, as can be seen from the example in Fig. 5.16T, the indication of a relevant document is replaced by the number representing the percentage of the total recall ratio for that particular question. Thus, with question 79, there were three relevant documents, each document therefore representing 33.3% of the total. With question 1000 having four relevant documents, each relevant document is 25% of the total. Question 141 has only one relevant document, so the retrieval of this single document represents 100% recall. These figures are summed for each column, then aggregated and finally, of course, reach a total of 4200. Recall figures can then be obtained. This process was carried out for all the index languages, and as can be seen from Fig. 5.17T this results in a general increase of two or three points in the normalised recall ratio; however, when placed in order, as in F[OCRerr]g. 5.18T, it can be seen that this order is virtually unchanged from that obtained with the average of numbers, with a positive rank correlation of +. 992, Fig. 5.19T shows the result of ranking documents on the complete collection of 1400 documents. It covers the 42 questions with Index Language I. 1 .a., and is therefore directly comparable with Fig. 5.3T which was based on the smaller collection of 200 documents. The first eleven ranking groups have been retained, after which they are enlarged to take in the greater number of documents. F!g. 5,20P gives the performance curves for the two situations, and shows that. as would be expected, the smaller generality number for the 1400 document collection adversely affects the performance. In Chapter 4, Section 8, were given the performance figures for the controlled term languages with Search E, which required some intellect to be applied to the search formulation.' The result of ranking the output from these searches is given in Fig. 5.21T, and the