CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Methods for presentation of results chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 69 - of every one of the 221 questions. Method 5 differs from all other methods described so far in be[OCRerr],g based on actual retrieval resu2/s obtained in testing. The method was generally known as 'recall levels', because a series of recall ratios is chosen in advance, and the performance results closest to the chosen recall levels are used to obtain the totals, irrespective of the coordination level of the search terms. Ideally this method should be applied to each individual question in a set, with the recall and precision ratios attained by each question being recorded when closest to 5% recall, then 10% recall, and so on. The calculations by Method 5 approximated to this by usin_[OCRerr] the recall levels of the nine retrieving term groups. The recall ratios of these retrieving term groups were arranged by a set of twenty-one recall levels, being 0%, 5%, 10% etc. to 100[OCRerr]o, and then the results in figures thus arranged were used to obtain twenty-one sets of recall and precision ratios. Fig. 3.31TP gives the table and plot of results, and the large number of performance points on the plot show a slight scatter through which the performance curve is drawn. Method 6 was known as 'Document output cutoff method', and was based on quite different principles to those already discussed. To explain this method, it is first necessary to consider the effect of the 'conventional' search cutoff method used in the test. This, as has been explained, was based on the coordination level, which is to say that with, for instance, a six-term question, the search result would be recorded for a coordination of all six terms, then it would be recorded for a coordination of five terms, then for a coordination of four terms and so on. It was this method of search cutoff, with questions having a range of different potential coordination levels, that caused the problem in totalling the results of the whole set of questions, and Method 6, involving a document output cutoff, seemed to overcome this problem. To apply this method, it was first necessary to obtain a ranked order of documents for every question, and, in our case, this had to be based on the coordination level cutoff results. A method of doing this was developed, but it entailed a considerable amount of effort. The decision as to which method to use for presentation of the results was not easy to make and has probably involved more discussion, both amongst ourselves and with other people, than any other single aspect of the test. The necessity for the particular series of attempts to total the results was due to the problem created by the coordination level cutoff. It seems reasonable to assume that the final method discussed, the document output cutoff method would be most satisfactory since it eliminated the basic problem of totalling different sets of results but it appeared to involve more effort than could be afforded. Jlil i i