CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Supplementary tests and results
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 225 -
Basic and supplementary questions
Figs. 6.4T and 6.5T present the results on Index Language I.l.a for
the 221 questions when these are divided into the 94 basic questions and 127
supplementary questions (see Vol. 1, Appendix 3G). The basic questions have
a generally superior performance, particularly in the middle range of
coordination levels and this can be partly accounted for by the higher
generality number for this group. On the other hand, documents relevant
to the suppiementary questions have an average relevance grading that is
h[OCRerr]gher than that for the basic questions (2.7 as against 3.0), and this would
have been expected to more than counter the previous effect. It might be
suspected that the difference in performance is due to a stronger artificial
match between the basic questions and, say, the document titles than exists
with the supplementary questions. While analysis does not bear this out,
no other adequate explanation can be offered, and the matter is con.sidered
again in Chapter 8.
Average of ratios
On pages 51 to 56, the matter of averaging sets of results was
considered, the discussion being on the question of using the average of
ratios or the average of numbers. To go into this in more detail, the
subset of 35 seven-starting-term questions with Index Language I.l.a on the
1400 document collection is used to demonstrate some difficulties that arise
with the average of ratios. Numerical results for the 35 questions can be
found in Appendix 4A and the results are presented (by the average of
numbers) in Fig. 4.110T.
It can be seen from Fig. 6.6T that, when ratios are obtained for each
individual question, three different situations arise. Firstly, there are those
questions (e.g. Q82} where it is possible to include recall and precision
ratios at all coordination levels to the maximum of 7 (since these are all
seven-starting-term questions). Secondly, there are those questions
(e.g. Q294) where no documents are retrieved at the higher coordination
levels, so no ratios can be included. Thirdly, there are those questions
{e.g. Q40) where at the higher coordination levels no relevant documents
are retrieved although some non-relevant documents are retrieved. This
latter situation is indicated in Fig. 6.6T by an asterisk in the appropriate
column. Because of these three different situations, it is a matter for
argument as to the figure which should be used for obtaining the average
ratios. A[OCRerr] an example, at the coordination level of four, the sum of the
precision ratios is 561.7. In order to obtain the average precision ratio
for the whole set of questions, this figure could be divided by [OCRerr]5, this
representing the total number of questions. Alternatively it could be
divided by 28, representing the questions which, at this particular
coordination level, retrieved some documents, either relevant or non-relevant.
Finally it could be divided by 23, representing the number of questions
which, at this particular coordination level, retrieved relevant documents.
With the results by the average of numbers for comparison, the precision
ratios obtained by these three methods are given in Fig. 6.7T.
The first method is clearly unsatisfactory; it would appear to be
relatively immaterial as to whether method 2 or 3 should be used, but it is
obviously important that when results are presented by the average of
ratios, it should be made quite clear as to which procedure has been
adopted. The complexity involved in presenting results by the average of
ratios is an additional reason why, in this report, we have preferred to