CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Supplementary tests and results chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 234 - If one makes the assumption that the coordination level of 3 for the Single Term index languages is approximately equal to a coordination level of 2 for the SLmple Coneep2 and Con[OCRerr].rel!ed Term index languages, then it is possible to present in a bar chart a representation of what happens in regard to recall and precisie[OCRerr] ratios when moving from one index language to another, Index Language If.! [OCRerr]a has the lowest recall ratio and highest precision ratio so this is taken as the starting point in Fig, 6.16T. Effects of precision devices In Chapter 4, Section 3, the results of tests on the Single Term index languages with Ln±erfixing and partitioning were presented. Figure 6. !7T and 6.18T make ex[OCRerr].racts from these tables of the figures at the coordination level of 4. Effects of question generality The individual results for each of the 221 questions with the 1400 docu- ment collection and Index Language I.!.a are given in Appendix 4A, and the figures for this particular set of results are given in Fig. 4,100T. As discussed in Chapter 3, this set of questions was a heterogenous group in a number of respects; various breakdown[OCRerr] have .now been made. First the questions have been grouped according to the number of documents relevant to each question, and table 6.19T shows the recall and precision ratios for each of the groups. There appears to be a general trend towards a lower recall ratio at any given coordination level for those searches where there are increased numbers of relevant documents; as usual this is matched by a higher precision ratio. If the questions having i-4 relevant documents and the questions having 16 or more relevant documents are grouped, then this change becomes more apparent, as is shown in Fig. 6.20T. However, the marked increase in the precision ratio at any given recall ratio is obviously due to the large increase in the generality number of the questions having 16 or more relevant documents. If one considers the fallout ratio, it can be seen from Fig. 6.21P that when recall is plotted against fallout, those questions which have four or less relevant documents have markedly superior performance. It would probably be correct to say that, as a rule, a question having few relevant documents is a specific query, while a question having a very large number of relevant documents is likely to be a general question. From this it is reasonable to hypothosise that a specific question should present a simpler retrieval problem a general question. Without suggesting that the results presented above prove this hypothosis, they can certainly be said to support it. Effect of number of postings The next breakdown of the 221 questions was made by grouping the questions according to the numbers of total postings of the question search terms; information on this is included with the set of results in Appendix 4A. For instance, as can be checked from Appendix 5,1 of Volume I, the three search terms of Question 295 (i.e. 'uniformly', 'loaded', 'sectors'.) have a total of only 46 postings, while for Question 106, the nine search terms have a total of 3,474 postings. Ten groups were formed on this basis, each