CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Methods for presentation of results
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
-62 -
For Method 1, the questions were totalled in a similar manner
to the starting term groups described above. This meant that for
any given coordination level, (say, for example, four terms), the
total results were obtained by adding the individual results for all the
221 questions, irrespective of the number of starting terms which each
question had. Two variants of this strict coordination level totalling
were considered. Method 1A involved totalling as described, and the
resulting performance ratios are given in Fig. 3.24T, for Single Term
Index Language I. 1. The performance plot is given in Fig. 3.24P,
with an additional curve of Language 1.6 for comparison. In Method
1B, account is taken of the fact that at the higher coordination levels,
many of the questions are not capable of contributing results, since
the number of starting terms in the question is fewer than the
coordination level. It is, for instance, quite impossible, at a
coordination level of seven terms, to retrieve documents related to
any of the questions which only have six. five, four, three or two
starting terms. This effect increases, of course', with the
coordination, level. In this case, therefore, the recall ratio is calculated
only for the questions that are capable of giving results. Fig. 3.25TP
shows this, where it is seen that at a coordination level of 8+, only
704 relevant documents, i.e. less than half of the real total for this
set of questions, are taken as the total of relevant documents being
sought. This results in an increased recall ratio compared with Method
1A, but the precision ratio is not affected. A disadvantage of this
method is that at each coordination level a change in generality occurs.
In Method 2, an attempt is made to allow for the fact that questions
differ according to the number of starting terms. The strict coordination
level of Method 1 can be faulted for equating, for example, the results
of a five starting-term question searched at a coordination level of four
terms, with the results of a ten starting-term question, also searched
at four terms. The basic Method 2 can be described as 'totalling by
proportional coordination levels', since it takes into account the potential
range of coordination levels, which differs between questions. For example,
a three starting-term question searched at a coordination level of two
terms is demanding a match of two-thirds of the theoretical maximum,
and in this method all questions having such a match would be included in
the group. For a six starting-term question, for a nine starting-term
question and for a twelve starting-term question, a two-thirds match
would be four terms, six terms and eight terms respectively, although, for
most other questions, no exact two-thirds match is possible. There are
obviously many variations which are possible, but the example presented
illustrates the use of this method when seven levels of match are chosen
to obtain a total result.
There are obviously many ways in which this method could be applied;
the example presented is where seven terms of match have been selected.
Whatever the actual number of coordination levels in any particular question,
the results are forced into the seven-term pattern. As can be seen from
Figure 3.26T, this means that certain results are repeated, while for
questions with more than seven starting-terms, certain results have to be
omitted.