CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Methods for presentation of results
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 57 -
Number of starting terms
2
[OCRerr]9
h0
.[OCRerr] 4
5
6
7
O
8
9
~I0
Totals
2 3
1 3
5
4 5 6
1 -
5 7 5
9 18 8
8 8
3
I 8 15 33 24
11
11
Z
35
8 9 I0 Ii 12 13 14 15
7 4 .....
8 4 3 2 1
7 7 10 1 1 2 1
5 8 6 4 3 i - 2
2 - 5 2 - 1 -
I 1 4 1
I -
27 26 20 17 7 4 1 3
Totals
6
28
56
45
36
32
i0
7
i
221
FIGURE 3.20T
DISTRIBUTION OF THE 221 QUESTIONS BY STARTING
TERMS AND RETRIEVING TERMS, IN ONE
PARTICULAR TEST.
]'he table in Fig. 3.20T may be considered as showing how. in two respects.
the 221 questions are a heterogeneous set of questions. Various subsets
of the 221 can be picked to overcome the variations, and truly homogeneous
subsets occupy each cell in the table, e.g. the five starting term group
with four retrieving terms is the largest such subset, having a total of
eighteen questions. A partially homogeneous subset, on the basis of one
common characteristic only (either starting terms or retrieving terms),
was the first to be examined in an attempt to find a method of totalling
the whole set.
The subset of seven-starting-term questions was chosen and totalled
by simply adding up each question at the seven possible coordination
levels, resulting in seven totals. These totals are shown in Fig. 3.21T,
and the recall precision percentages are recorded, these being calculated
by using the average of numbers. The seven average recall and precision
ratios are plotted in Fig. 3.21P, thus producing a performance curve for
35 questions, when the exhaustivity of search is altered by coordination
levels. Since the characteristic of retrieving terms was ignored, not all
the 35 questions provide results at all coordination levels, and, as was
seen in Fig. 3.20T, one question is unable to retrieve any documents when
more than two of the terms are demanded in coordination, and only three
questions provide results at a coordination level of seven. The number of