CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Methods for presentation of results chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 57 - Number of starting terms 2 [OCRerr]9 h0 .[OCRerr] 4 5 6 7 O 8 9 ~I0 Totals 2 3 1 3 5 4 5 6 1 - 5 7 5 9 18 8 8 8 3 I 8 15 33 24 11 11 Z 35 8 9 I0 Ii 12 13 14 15 7 4 ..... 8 4 3 2 1 7 7 10 1 1 2 1 5 8 6 4 3 i - 2 2 - 5 2 - 1 - I 1 4 1 I - 27 26 20 17 7 4 1 3 Totals 6 28 56 45 36 32 i0 7 i 221 FIGURE 3.20T DISTRIBUTION OF THE 221 QUESTIONS BY STARTING TERMS AND RETRIEVING TERMS, IN ONE PARTICULAR TEST. ]'he table in Fig. 3.20T may be considered as showing how. in two respects. the 221 questions are a heterogeneous set of questions. Various subsets of the 221 can be picked to overcome the variations, and truly homogeneous subsets occupy each cell in the table, e.g. the five starting term group with four retrieving terms is the largest such subset, having a total of eighteen questions. A partially homogeneous subset, on the basis of one common characteristic only (either starting terms or retrieving terms), was the first to be examined in an attempt to find a method of totalling the whole set. The subset of seven-starting-term questions was chosen and totalled by simply adding up each question at the seven possible coordination levels, resulting in seven totals. These totals are shown in Fig. 3.21T, and the recall precision percentages are recorded, these being calculated by using the average of numbers. The seven average recall and precision ratios are plotted in Fig. 3.21P, thus producing a performance curve for 35 questions, when the exhaustivity of search is altered by coordination levels. Since the characteristic of retrieving terms was ignored, not all the 35 questions provide results at all coordination levels, and, as was seen in Fig. 3.20T, one question is unable to retrieve any documents when more than two of the terms are demanded in coordination, and only three questions provide results at a coordination level of seven. The number of