CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Methods for presentation of results chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. -72 - As an example, two collections are hypothesised (see Fig. 3.32T), Collection A having I000 documents and Collection I[OCRerr] having i0,000 documents. In both collections there are assumed to be ten relevant documents for a given question, giving a generality number of i0 for Collection A and 1 for Collection B. It is hypothesised that the recall ratio is 50% and that the proportion of non-relevant retrieved to collection size remains the same. The fact that the proportion of non-relevant retrieved remains the same means that the fallout ratio will be i. 0%*, although the precision ratio changes from 33.3% in Collection A to 4.8% in Collection B, reflecting the decrease in the generality number. A recall/fallout plot would indicate an identical performance, concealing the information that in Collection A a fallout ratio of i. 0% means the retrieval of ten non-relevant documents and in Collection B it means the retrieval of one hundred non-relevant documents. On the other hand a plot of recall/precision would correctly indicate this change. COLLECTION A 1000 DOCUMENTS Relevant Non-Relevant Generality 10 Retrieved 5 10 15 Recall 50% Not Retrieved 5 980 985 Fallout 1.0% 10 990 1,000 Precision 33.3% COLLECTION B 10,000 DOCUMENTS Relevant - Non-Relevant Generality 1 Retrieved 5 100 105 Recall 50% Not l[OCRerr]etrieved 5 9890 9895 Fallout 1.0% 10 9990 10,000 Precision 4.8% FIGURE 3.32T TWO SETS OF PERFORMANCE RESULTS WITH DIFFERENT GENERALITY NUMBERS AND CONSTANT RECALL AND FALLOUT RATIOS. For a comparison of retrieval performance, it can be argued that the result revealed by the fallout ratios is more useful, since the change in precision ratio is solely due to the change in the environmental factor of the generality number. However, we have earlier stated our intention to present the main body of results with recall/precision plots, on the ground that these, in general, make a more useful and comprehensible *This is correct to one decimal place; the actual figures are, respectively, 1.0101%, recurring and 1.001001% recurring.