CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Simulated ranking and document output cut-off
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 205 -
in this report, it appears to give a valid single measure for comparing
the performance of different systems, and, without wishing to be
overdogmatic,appears more suitable for this purpose than anythin[OCRerr] else
that has been proposed.
Having - to our satisfaction - established the reasonableness both of
the simulated ranking method and also the method for obtaining normalised
recall, the procedure was used for %he four main groups of index languages.
Fig. 5. liT gives the recall and precision ratios for the eight single term
languages, while Fig. 5.12T gives similar figures for the fifteen concept
languages. The results of the six controlled languages are given in
Fig. 5.13T and the searches of titles and abstracts are shown in Fig. 5.14T.
These tables also show the normalised recall ratio for each index language.
In Fig. 5.15T the index languages are rearranged into an order based on
this normalised recall ratio, from which it can be seen that the highest
score (65.82) is obtained by Index Language I. 3.a (single terms, word
forms), with the lowest score (44.64) for Index Language II. 1.a (single
concepts, natural language). It will be noted that this table also includes
the fourteen SMART options.
The figures given so far have been based on what has earlier been
described as the average of numbers, and it might be thought that the
document ranking method would be particularly susceptible to aberrations
which the average of numbers sometimes produces. The results have
therefore been recalculated by the average of ratios. To do this, as can
be seen from the example in Fig. 5.16T, the indication of a relevant
document is replaced by the number representing the percentage of the
total recall ratio for that particular question. Thus, with question 79,
there were three relevant documents, each document therefore representing
33.3% of the total. With question 1000 having four relevant documents,
each relevant document is 25% of the total. Question 141 has only one
relevant document, so the retrieval of this single document represents
100% recall. These figures are summed for each column, then aggregated
and finally, of course, reach a total of 4200. Recall figures can then be
obtained.
This process was carried out for all the index languages, and as can
be seen from Fig. 5.17T this results in a general increase of two or
three points in the normalised recall ratio; however, when placed in order,
as in F[OCRerr]g. 5.18T, it can be seen that this order is virtually unchanged
from that obtained with the average of numbers, with a positive rank
correlation of +. 992,
Fig. 5.19T shows the result of ranking documents on the complete
collection of 1400 documents. It covers the 42 questions with Index
Language I. 1 .a., and is therefore directly comparable with Fig. 5.3T
which was based on the smaller collection of 200 documents. The first
eleven ranking groups have been retained, after which they are enlarged
to take in the greater number of documents. F!g. 5,20P gives the
performance curves for the two situations, and shows that. as would be
expected, the smaller generality number for the 1400 document collection
adversely affects the performance.
In Chapter 4, Section 8, were given the performance figures for
the controlled term languages with Search E, which required some
intellect to be applied to the search formulation.' The result of ranking
the output from these searches is given in Fig. 5.21T, and the