Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2

CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Methods for presentation of results chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 55 - curves. In the tests at Cranfield and in other tests where sufficient data has been available, the samples which have been processed by both methods have always shown this increase in precision with recall remaining much the same. However, we do not wish to be misquoted on this point and would emphasize that while it is probably true that the average of ratios will usually give a better performance figure, it would be wrong to assume that the proportional improvement would always be so pronounced as in the example shown. An evaluation of the two methods which shows one method to be superior is not possible, since proponents of both methods can give good reasons for adopting one method in preference to the other. The theoretical cause of the discrepancy is the variation in the base from question to question: in the case of the recall ratio it is the number of relevant documents sought; in the precision ratio it is the total retrieved; and in the fallout ratio it is the total non-relevant. The average of numbers method weights the results of individual questions according to the base, and a larger base exerts a greater influence on the final result. The average of ratios completely ignores the base variation. In situations outside retrieval tests, where similar data has to be averaged, it is frequently advocated that the variation in base should be allowed for, and the average of numbers used (see, for instance 13ef. 12, page 161). The difference in the results of the two methods is small except when the range and distribution of the variation in base becomes large, as is often the case with the precision ratio. However, both methods appear to be equally reasonable for use in retrieval situations, and the different results are really complementary viewpoints requiring careful interpretation. A description of the different viewpoints represented by the two methods has been given by Salton (Ref. 13). He suggests that the average of ratios is 'a query-oriented viewpoint', and the average of numbers is a 'document-oriented viewpoint'; performance figures using the average of ratios indicate the performance of a single typical search question, typical that is of the set of questions used in the test. The use of average of numbers indicates the result of the whole set of questions, or indicates the success in performance of looking for a given set of relevant documents (287 in the example being used). This really ignores the actual individual questions involved, since one question with 287 relevant documents could in theory have the same result as 35 questions having in total 287 relevant documents. Thus the average of numbers gives an arithmetical mean value for a set of questions, and .the average of ratios gives what approximates to a 'median' value which reflects the performance of a typical question. Neither method appears to have any marked superiority over the other as a means of presenting results. However, the decision to use in this volume the average of numbers method was based on a most