CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Methods for presentation of results
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
-73 -
presentation of performance. It is therefore necessary to make
adjustments to the precision ratios in certain situations (which have
been considered in Chapter 2) where sets of varying generality
have to be compared. This is reasonably straightforward and is
obtained by the following equation:-
PA (Adjusted Precision Ratio) =
R1 x G2
(1[OCRerr]1 x G2) + FI(1000 - G2)
where R1 = Recall ratio obtained for a given system, in a
situation of a known generality number
F1 = Fallout ratio obtained for the given system, in
a situation of a known generality number
G2 = Generality number to which it is desired to alter
the results, to obtain the adjusted precision
Thus two sets of performance figures obtained with systems-of
differing generality can be compared by adjusting the precision ratio of
one case, so that it is based on the generality number of the other. If
the example in Fig. 3.32T were to be corrected, and if it were decided
to alter the result of Collection A to fit the generality of Collection B,
then, from the equation given above,
.50 x 1 .50
PA = = = . 048
(.50 x 1) + .01(1000 - 1)
.50 + 9.99
The answer, expressed as a percentage is 4.8% and this result is
clearly correct, with both cases now having an identical recall ratio,
fallout ratio and precision ratio,
This however, is a simplified example, and in practice the matter is
complicated by what at present seems to be the most difficult problem in
performance comparison, namely the determination of the correct N. (the
size of the collection). To consider this, an actual result is taken from
a particular set of 42 questions that were searched on collections A and
B where N equals 200 and 1400 documents "respectively, the documents
in collection A being'a subset of the documents in collection B. The details
are given in FL.g. 3.33T, with the two sets of performance figures obtained
in exactly the same conditions. While the precision ratio for collection A
has increased with the increased generality number, yet there is also a
significant difference in the fallout ratio.