CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Methods for presentation of results
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
-72 -
As an example, two collections are hypothesised (see Fig. 3.32T),
Collection A having I000 documents and Collection I[OCRerr] having i0,000
documents. In both collections there are assumed to be ten relevant
documents for a given question, giving a generality number of i0 for
Collection A and 1 for Collection B. It is hypothesised that the recall
ratio is 50% and that the proportion of non-relevant retrieved to collection
size remains the same. The fact that the proportion of non-relevant retrieved
remains the same means that the fallout ratio will be i. 0%*, although
the precision ratio changes from 33.3% in Collection A to 4.8% in
Collection B, reflecting the decrease in the generality number. A
recall/fallout plot would indicate an identical performance, concealing
the information that in Collection A a fallout ratio of i. 0% means the
retrieval of ten non-relevant documents and in Collection B it means the
retrieval of one hundred non-relevant documents. On the other hand a
plot of recall/precision would correctly indicate this change.
COLLECTION A
1000 DOCUMENTS
Relevant Non-Relevant Generality 10
Retrieved 5 10 15 Recall 50%
Not Retrieved 5 980 985 Fallout 1.0%
10 990 1,000 Precision 33.3%
COLLECTION B
10,000 DOCUMENTS
Relevant - Non-Relevant Generality 1
Retrieved 5 100 105 Recall 50%
Not l[OCRerr]etrieved 5 9890 9895 Fallout 1.0%
10 9990 10,000 Precision 4.8%
FIGURE 3.32T
TWO SETS OF PERFORMANCE RESULTS WITH
DIFFERENT GENERALITY NUMBERS AND CONSTANT
RECALL AND FALLOUT RATIOS.
For a comparison of retrieval performance, it can be argued
that the result revealed by the fallout ratios is more useful, since the
change in precision ratio is solely due to the change in the environmental
factor of the generality number. However, we have earlier stated our
intention to present the main body of results with recall/precision plots, on the
ground that these, in general, make a more useful and comprehensible
*This is correct to one decimal place; the actual figures are, respectively,
1.0101%, recurring and 1.001001% recurring.