CRANV2
Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2
Supplementary tests and results
chapter
Cyril Cleverdon
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 234 -
If one makes the assumption that the coordination level of 3 for the
Single Term index languages is approximately equal to a coordination level
of 2 for the SLmple Coneep2 and Con[OCRerr].rel!ed Term index languages, then it is
possible to present in a bar chart a representation of what happens in regard
to recall and precisie[OCRerr] ratios when moving from one index language to another,
Index Language If.! [OCRerr]a has the lowest recall ratio and highest precision ratio
so this is taken as the starting point in Fig, 6.16T.
Effects of precision devices
In Chapter 4, Section 3, the results of tests on the Single Term index
languages with Ln±erfixing and partitioning were presented. Figure 6. !7T and
6.18T make ex[OCRerr].racts from these tables of the figures at the coordination
level of 4.
Effects of question generality
The individual results for each of the 221 questions with the 1400 docu-
ment collection and Index Language I.!.a are given in Appendix 4A, and the
figures for this particular set of results are given in Fig. 4,100T. As
discussed in Chapter 3, this set of questions was a heterogenous group in a
number of respects; various breakdown[OCRerr] have .now been made.
First the questions have been grouped according to the number of
documents relevant to each question, and table 6.19T shows the recall and
precision ratios for each of the groups.
There appears to be a general trend towards a lower recall ratio at
any given coordination level for those searches where there are increased
numbers of relevant documents; as usual this is matched by a higher precision
ratio. If the questions having i-4 relevant documents and the questions
having 16 or more relevant documents are grouped, then this change becomes
more apparent, as is shown in Fig. 6.20T.
However, the marked increase in the precision ratio at any given recall
ratio is obviously due to the large increase in the generality number of the
questions having 16 or more relevant documents. If one considers the fallout
ratio, it can be seen from Fig. 6.21P that when recall is plotted against
fallout, those questions which have four or less relevant documents have
markedly superior performance.
It would probably be correct to say that, as a rule, a question having
few relevant documents is a specific query, while a question having a very
large number of relevant documents is likely to be a general question. From
this it is reasonable to hypothosise that a specific question should present a
simpler retrieval problem a general question. Without suggesting that the
results presented above prove this hypothosis, they can certainly be said to
support it.
Effect of number of postings
The next breakdown of the 221 questions was made by grouping the
questions according to the numbers of total postings of the question search
terms; information on this is included with the set of results in Appendix 4A.
For instance, as can be checked from Appendix 5,1 of Volume I, the three
search terms of Question 295 (i.e. 'uniformly', 'loaded', 'sectors'.) have a
total of only 46 postings, while for Question 106, the nine search terms have
a total of 3,474 postings. Ten groups were formed on this basis, each