Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2

CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Methods for presentation of results chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 39 - A comparison of the recall ratio with fallout ratio can be made in the same way. We are not aware of any previous occasions when the fallout ratio has been used for presenting test results, although Swets (Ref. 4) has discussed its possible use. In that it measures the ratio of the non-relevant retrieved to the total non-relevant in the collection b b + d ' it is very sensitive to N, the total number of documents in the collection. While it might not be found to be particularly satisfactory for tests on operational systems, it has an attraction in experimental testing where collections of different but known size are being tested, since it automatically compensates for the changes in size. Fig. 3.5T takes the figures of Fig. 3.3T and Fig. 3.4T and replaces the precision ratio by fallout ratio. A characteristic of fallout ratios is that they tend to be concentrated at low numbers; for this reason the figures are taken to three places of decimals and the resultant plot of recall ratio against fallout ratio is clearer if made on a semi-log scale, as in Fig. 3.5P. In this case the better performance is obtained when the curve is nearer the top left hand corner, whereas the recall precision curve is optimised towards the top right hand corner. Therefore, as in Fig. 3.4P, search Y is shown to give a generally improved performance over search X. Either of these twin measures is satisfactory for presenting the performance of systems where the generality number is held constant, although the argument has been advanced that a plot of recall/precision is not valid since both ratios contain a (relevant retrieved). It has been incorrectly argued that in plotting a against a all the a's a+c a+b ' cancel out, with the result that the factors being plotted are c against b. Fairthorne (Ref. 5) has said that a more reliable precision ratio is given by what he calls the 'distillation ratio' which is a c a+b d C . However, he agrees that when the correction factor of -- is negligible o compared with the precision ratio, the latter is a valid measure. In fact, in the results presented in Fig. 3.3T, the correction factor at the coordination level of five terms is 0. 0038, which can definitely be considered negligible. Rees (Ref. 6) argues against precision ratio in favour of a measure d that is complementary to fallout, namely b + d ' on the grounds that it takes into account one of the vital parameters in a retrieval system - size of file. To some extent this is true, but it is a matter which has to be approached very carefully. The difficulty lies in determining exactly what is the correct value of N, that is to say how many documents can validly be considered to form the total collection in regard to any question. This matter is considered in more detail later in this chapter. It is true that the same difficulty arises in calculating the generality number, but if N is known, then it is just as easy to calculate the generality number as to