CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Main test results chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 84 It can now be argued that the performance results based on the smaller sets of questions give valid results. First, there is a close similarity in the relative differences obtained when five recall languages are compared, whichever sat of questions is used. This may be seen by comparing Figs. 4.105P, 4.115P and 4.125P where the relative differences between the five languages are very similar. A further comparison is made on a recall fallout plot, in Fig. 4.131P, where generality is allowed for, and the language is held constant as type I.l.a. Some small variation is to be expected when the question sets are altcred in size, and when a universal totalling method is to be used, but the subset of 42 questions, on which many of the later results are based, is seen to be representative of both the larger set of 221 questions and of the set with the chosen characteristic of each question having the same number of starting terms. In the question sets shown so far, the collection size has remained constant at 1400 documents. Most of the later results have been obtained on the 200 document collection (collection subset i), and thc validity of results based on the smaller collection size will be considered next. Table 4.140T gives the results for a search on language I.l.a made with the 42 questions searched on the 200 document collection. The results from the table are plotted as a performance curve in Fig. 4.140P. Also shown on this plol are the results from Fig. 4.120T, which are based on the same question set and language but searched on the 1400 collection. It would be expected that with a recall/precision plot, the increased generality number would result in a better performance for the searches on the 200 collection as compared to those on the 1400 collection. This is, in fact, the case, for while the recall at each coordination level is seen to be identical, the expected large increase in precision is seen when the 200 collection is searched. The effect of the change in [OCRerr]enerality on the precision ratio, as discussed in Chapter 3, is allowed for in Fig. 4.141P, which is a plot of the two curves using recall and fallout ratios, and Fig. 4.142P which plots the two curves on a recall/precision plot with generality adjusted to a constant of 23.6, this number being the generality of the situation in the 200 collect[on. The result for the 1400 collection is now no longer inferior to the 200 collection, and in fact the situation is revermed. The reason why the 200 collection now has a somewhat worse performance has been investigated in Chapter 3, where it has been shown t-hat the cause of the difference can be adequately explained and allowed for; this was, in fact, done and the result was shown in Figure 3.34T. The purpose of the introductory results presented is to demonstrate that test results based on a relatively small colIection and set of questions do give valid results. The variations observed between the three different