CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Methods for presentation of results
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 59 -
questions that contribute results at each coordination level is recorded
in Fig. 3.21P.
Although it has the bad characteristic in the reduced sample size
at high coordination levels, it is suggested that totalling by starting
term groups is a quite valid and satisfactory method.
On the other hand the totalling method using the retrieving term
subset does not have this reduced sample size problem, and this was
the next method to be investigated. The subset having five retrieving
terms is obviously all composed of questions having five or more starting
terms; as can be seen from Fig. 3.20T, there are 45 such questions
and the results of this subset are given in Fig. 3.22TP. Here the low
recall end of the curve does not sweep to high precision values, but
stops at 26% precision at 15% recall. The main disadvantage of the
retrieving terms subset totalling is that the composition of each subset
alters whenever any language variable is introduced. This means that
the generality number will be continually changing, and it therefore
becomes more difficult to make comparisons.
While the matter of partly homogeneous sets presented little
difficulty, the major problem lay in totalling the questions in the whole
heterogeneous set of 221 questions; the results of our investigations on
this point showed that no single method was conspicuously superior or
satisfactory for all the different test situations. Many different methods
were tried, but, with minor variations, they fell into six main groups.
Summarised in Fig. 3.23T these are described in the following pages.
Method
IA
IB
Description
Strict Coordination Levels.
Strict Coordination Levels with adjustment for
questions having no capability of retrieving.
Proportional Coordination Levels.
Maximum Starting Term Coordination Levels.
Maximum Retrieving Term Coordination Levels.
Recall Levels of Reti[OCRerr]ieving Term Groups.
Document Output Cutoff with ranked output
derived from the coordination levels.
FIGURE 3.23T SUMMARY OF TOTALLING METHODS