CRANV2 Aslib Cranfield Research Project: Factors Determining the Performance of Indexing Systems: Volume 2 Conclusions chapter Cyril Cleverdon Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. 258 - only be based on a match between words in the document and the qu'estion. might be expected to favour systems using precise natural language, while user relevance decisions might logically be expected to favour systems which bring in groups of related terms. The conclusion is therefore reached that the method of obtaining relevance decisions in this test could not have been responsible for the unexpected results, since any influence it might have had would have tended to work in the opposite direction. Without going against the above argument, Vickery (Ref.l,9 rightly points out that "There are still verbal links between source document and question; the questions supplied by the author - some time after [OCRerr]!,,ing the research - were formulated after the cited papers had been read and possibly influenced the wording of his question. " This raises two separate questions; firstly, is it very much different to what happens in a real life situation, and secondly, is the effect serious enough to distort the test results? To consider the first point, experience in the evaluation test of Medlars at the National Library of Medicine has shown that the majority of questioners are already aware of certain relevant documents before asking for a search to be carried out. It therefore seems likely that, in real life, search questions must often be influenced by the terminology of relevant documents, and therefore the procedure which was adopted in this test for obtaining questions is not far removed from what normally happens. If, however, the actions of those who prepared the search questions were significantly different from what happens in real life, then it is necessary to consider whether the results are likely to have been distorted. To determine whether this is so would require a far deeper analys[OCRerr]s of the individual searches than has so far been done. Our own opinion is that if such an analysis were made, it would show that in the large majority of cases there had been no serious distortion, and it is difficult to believe that the few cases where'it might have occurred would have been sufficient to produce the significant - and consistent - variations in performance. The concept indexing w0.s done by selecting from each document those concepts which appeared to be of importance. This being an intellectual task it is not possible to argue that it was done correctly. Readers of the reports on Cranfield I will recollect that the errors of the indexers were the cause of a significant number of failures to retrieve relevant documents, but that considered as a percentage of torsi indexing, it represented a very low "error rate". Usually in that test the errors were errors of omission. The higher level of indexing exhaustivity, and the longer time devoted to indexing each document made it less likely that these would occur in this project, and some analysis of the failures to retrieve relevant documents has not revealed any significant errors in this respect. Certainly it does not seem plausible to suggest that any such errors could have influenced the comparative results. While the complete indexing was more exhaustive than would normally be the case, the assignment of an indexing weight to each concept permitted the testing of various levels of indexing exhaustivity. The test results are given in Chapter 4, Section 4, and again show that whatever the level of indexing exhaustivity might be, the effect of moving from !ndex Language I.i.a to Index Language 1.6.a is consistent, and there is no evidence to suggest that the exhaustivity of indexing affected the comparison between different index languages. Concerning this level of exhaustivity of indexing, it again becomes obvious that there was an optimum in regard to this particular document/ question set. The lowest level of exhaustivity of indexing investigated was the search on titles only; the highest level of exhaustivity occurred with the