Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2

CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Methods for presentation of results chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. -62 - For Method 1, the questions were totalled in a similar manner to the starting term groups described above. This meant that for any given coordination level, (say, for example, four terms), the total results were obtained by adding the individual results for all the 221 questions, irrespective of the number of starting terms which each question had. Two variants of this strict coordination level totalling were considered. Method 1A involved totalling as described, and the resulting performance ratios are given in Fig. 3.24T, for Single Term Index Language I. 1. The performance plot is given in Fig. 3.24P, with an additional curve of Language 1.6 for comparison. In Method 1B, account is taken of the fact that at the higher coordination levels, many of the questions are not capable of contributing results, since the number of starting terms in the question is fewer than the coordination level. It is, for instance, quite impossible, at a coordination level of seven terms, to retrieve documents related to any of the questions which only have six. five, four, three or two starting terms. This effect increases, of course', with the coordination, level. In this case, therefore, the recall ratio is calculated only for the questions that are capable of giving results. Fig. 3.25TP shows this, where it is seen that at a coordination level of 8+, only 704 relevant documents, i.e. less than half of the real total for this set of questions, are taken as the total of relevant documents being sought. This results in an increased recall ratio compared with Method 1A, but the precision ratio is not affected. A disadvantage of this method is that at each coordination level a change in generality occurs. In Method 2, an attempt is made to allow for the fact that questions differ according to the number of starting terms. The strict coordination level of Method 1 can be faulted for equating, for example, the results of a five starting-term question searched at a coordination level of four terms, with the results of a ten starting-term question, also searched at four terms. The basic Method 2 can be described as 'totalling by proportional coordination levels', since it takes into account the potential range of coordination levels, which differs between questions. For example, a three starting-term question searched at a coordination level of two terms is demanding a match of two-thirds of the theoretical maximum, and in this method all questions having such a match would be included in the group. For a six starting-term question, for a nine starting-term question and for a twelve starting-term question, a two-thirds match would be four terms, six terms and eight terms respectively, although, for most other questions, no exact two-thirds match is possible. There are obviously many variations which are possible, but the example presented illustrates the use of this method when seven levels of match are chosen to obtain a total result. There are obviously many ways in which this method could be applied; the example presented is where seven terms of match have been selected. Whatever the actual number of coordination levels in any particular question, the results are forced into the seven-term pattern. As can be seen from Figure 3.26T, this means that certain results are repeated, while for questions with more than seven starting-terms, certain results have to be omitted.