CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Main test results chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 83 Section 1 Introductory TabLes The first set of results is based on 22! questions (question subset 3) searched on the full collection of !400 documents. In Tables 4.100T to 4.104T, five of the languages investigated on the single term indexing are presented, namely languages i.l.a, 1.2.a, 1.3.a, 1.5.a, and 1.6.a, showing the effect of recall devices. The results for these five languages are presented on a single plot in Figure 4.105P, with a performance curve for each of the languages plotted. It was verv difficult to find a satisfactory method of totalling the results with these 22i questions, because of the largc variation in the number of starting terms. This problem was fully discussed in Chapter 3 (pages 51 71), but in order to validate the selected method, a subset of 35 questions was selected, the characteristic of which was that each question had seven starting terms, and in this respect w[OCRerr]s an average set. The results for searches for the same five index languages are presented, in Figures 4.110T - 4.114T. The questions are again searched on the 1400 document collection, and a single plo! of the five curves is given in Fig. 4.115P. A further subset of questions, 42 i[OCRerr]z number (subset 2), is used for the results given in Figs, 4.120T to 4.124T. The same five languages are used, and the 1400 document collection is searched. These questions pose the same problems in totalling as did the 221 questions, since the 42 questions have varying numbers of starting terms, ranging from three to twelve. A sing!e plot of the five languages is given in Fig. 4.125P. In these three sets of questions, a progression may be observed from the largest set of questions (221) to a smaller set of 35 questions specially selected to rninimise the totalling problem and to another small set of 42 questions that has the same problem of totalling, due to the variation in number of starting terms, as the collection of 221 questions, The differences in question sample size may be expected to affect any direct comparison of the three sets of questions, in addition to the totalling method problem. The effect of this can be seen in Fig. 4.130P where the natural language results (Language I.l.a) are compared for the three subsets of questions, the three curves being based on the results in Figs. 4.100T, 4.110'I", and 4.120T, Although the comparison is accurate in terms of recall and precision as calculated, the comparison of three different sets of questions brings in a new variation, namely the generality number (G). For the 221 questions G is 5.1, for the 35 questions G is 5.9, and for the 42 questions G is 3.4. The need to allow for this difference in generality has been discussed, and Fig. 4.131P is a graph that allows for this by use of a plot of reca]_l and fallout.