CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Main test results
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
83
Section 1 Introductory TabLes
The first set of results is based on 22! questions (question subset
3) searched on the full collection of !400 documents. In Tables
4.100T to 4.104T, five of the languages investigated on the single
term indexing are presented, namely languages i.l.a, 1.2.a, 1.3.a,
1.5.a, and 1.6.a, showing the effect of recall devices. The results
for these five languages are presented on a single plot in Figure
4.105P, with a performance curve for each of the languages plotted.
It was verv difficult to find a satisfactory method of totalling
the results with these 22i questions, because of the largc variation
in the number of starting terms. This problem was fully discussed
in Chapter 3 (pages 51 71), but in order to validate the selected
method, a subset of 35 questions was selected, the characteristic
of which was that each question had seven starting terms, and in this
respect w[OCRerr]s an average set. The results for searches for the
same five index languages are presented, in Figures 4.110T - 4.114T.
The questions are again searched on the 1400 document collection,
and a single plo! of the five curves is given in Fig. 4.115P.
A further subset of questions, 42 i[OCRerr]z number (subset 2), is used
for the results given in Figs, 4.120T to 4.124T. The same five
languages are used, and the 1400 document collection is searched.
These questions pose the same problems in totalling as did the 221
questions, since the 42 questions have varying numbers of starting terms,
ranging from three to twelve. A sing!e plot of the five languages is
given in Fig. 4.125P.
In these three sets of questions, a progression may be observed
from the largest set of questions (221) to a smaller set of 35 questions
specially selected to rninimise the totalling problem and to another small
set of 42 questions that has the same problem of totalling, due to
the variation in number of starting terms, as the collection of 221 questions,
The differences in question sample size may be expected to affect any
direct comparison of the three sets of questions, in addition to the
totalling method problem. The effect of this can be seen in Fig.
4.130P where the natural language results (Language I.l.a) are compared
for the three subsets of questions, the three curves being based on the
results in Figs. 4.100T, 4.110'I", and 4.120T, Although the comparison
is accurate in terms of recall and precision as calculated, the comparison
of three different sets of questions brings in a new variation, namely
the generality number (G). For the 221 questions G is 5.1, for the
35 questions G is 5.9, and for the 42 questions G is 3.4. The need to
allow for this difference in generality has been discussed, and Fig. 4.131P
is a graph that allows for this by use of a plot of reca]_l and fallout.