CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Test Environment
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
28 -
is always given against any figures which have been calculated on a
reduced set. At a single term level, it can be seen that no searches
were made, and therefore no figures can be estimated for precision or
fallout ratios.
There were various possible procedures for es'tirnating these figures,
and these can be illustrated by reference to Fig. 2.15, which deals with
the 35 questions subset searched on 1400 documents by Index Language
1.5.a. Since all the questions had seven starting terms, z remains
constant throughout. However, at a coordination level of 2, it is shown
in column y that only 23 questions were searched. It was found that,
with these 23 questions 8, 565 non-relevant documents were retrieved
together with 157 relevant documents. The simplest way of estimating
the total non-relevant for the complete subset of 35 questions would be
35
to scale up the above figure of 8,565 in the ratio of [OCRerr]-[OCRerr] , which would
give a total of 13,033 non-relevant documents. On the basis of this
figure the precision and fallout ratios* could now be calculated. A
second method is first to determine the precision ratio for the 23 questions
searched; in this case it works out at 1.8%. It is known that the 35
questions retrieved 253 relevant documents; to maintain the precision ratio
253
of 1.8% the total of non-relevant is scaled up by [OCRerr] , namely the totals
of relevant document[OCRerr] retrieved in the full set and in the subset. This gives
a figure of 13,803 and from this the fallout ratio can be calculated.
The accuracy of these scaled up results will depend on whether
the sample of questions that were searched is typical of the whole set.
It is unlikely that this was tile case; as stated earlier, questions were
not searched when they would retrieve an excessive number of non-
relevant documents, so conversely the questions which were searched,
and which are therefore in the sample, were those which had fewer
non-relevant documents. Scaling-up from the sample could therefore
be expected to give a somewhat higher precision figure than was really
the case.
To check on this, we can consider the actual situation in regard to
the same set of questions with Index Language I.l.a, on which, as
previously mentioned, searches were made down to the single-term
level.
In this language, at a coordination level of 2, the 23 questions
retrieved 3871 documents. By the methods already suggested, the
estimated figures would have been 6043 and 6476 respectively. In fact,,
the correct figure is 8086, and bears out the expectation expressed in
the previous paragraph. This was also checked at the coordination level
of 3, and again it was found that the remaining 12 searches retrieved
'*The method .of ealculating these ratios, is discussed in Chapter 3.