CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Main test results
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
84
It can now be argued that the performance results based on
the smaller sets of questions give valid results. First, there
is a close similarity in the relative differences obtained when
five recall languages are compared, whichever sat of questions is
used. This may be seen by comparing Figs. 4.105P, 4.115P
and 4.125P where the relative differences between the five languages
are very similar. A further comparison is made on a recall fallout
plot, in Fig. 4.131P, where generality is allowed for, and the
language is held constant as type I.l.a. Some small variation
is to be expected when the question sets are altcred in size, and
when a universal totalling method is to be used, but the subset
of 42 questions, on which many of the later results are based, is seen
to be representative of both the larger set of 221 questions and of
the set with the chosen characteristic of each question having the
same number of starting terms.
In the question sets shown so far, the collection size has
remained constant at 1400 documents. Most of the later results have
been obtained on the 200 document collection (collection subset i),
and thc validity of results based on the smaller collection size will be
considered next.
Table 4.140T gives the results for a search on language I.l.a made
with the 42 questions searched on the 200 document collection. The
results from the table are plotted as a performance curve in Fig.
4.140P. Also shown on this plol are the results from Fig. 4.120T,
which are based on the same question set and language but searched
on the 1400 collection. It would be expected that with a recall/precision
plot, the increased generality number would result in a better
performance for the searches on the 200 collection as compared to those
on the 1400 collection. This is, in fact, the case, for while the recall
at each coordination level is seen to be identical, the expected large
increase in precision is seen when the 200 collection is searched.
The effect of the change in [OCRerr]enerality on the precision ratio, as
discussed in Chapter 3, is allowed for in Fig. 4.141P, which is a
plot of the two curves using recall and fallout ratios, and Fig. 4.142P
which plots the two curves on a recall/precision plot with generality
adjusted to a constant of 23.6, this number being the generality of
the situation in the 200 collect[on. The result for the 1400 collection
is now no longer inferior to the 200 collection, and in fact the situation
is revermed. The reason why the 200 collection now has a somewhat
worse performance has been investigated in Chapter 3, where it has
been shown t-hat the cause of the difference can be adequately explained
and allowed for; this was, in fact, done and the result was shown in
Figure 3.34T.
The purpose of the introductory results presented is to demonstrate
that test results based on a relatively small colIection and set of questions
do give valid results. The variations observed between the three different