ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text

CRANV1P1 ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text Documents and Questions chapter Cyril Cleverdon Jack Mills Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 34 - source document', in Cranfield I and the W.R.U. Tests. Swanson, in a sample taken from the first project, demonstrated that this biased relationship was shown by an unusually close match between the words of the question and the titles of the relevant documents. In his paper, Swanson gives the result of an analysis of the terms used in a set of 100 questions and the titles of their accompanying source documents. This was done by the Cranfield group and discussed at some length on pages 27-32 of Ref. 2, although Swanson does not comment on this work. instead he prepared an admittedly more exact method, which would give, according to his view, retrieval of the source document and the number of irrelevant documents also retrieved would be small. To do this, he took the 100 document titles given in Appendix 4B. and made a list of all the terms which did not occur more than once. From this he argued that, if such a term also occurs in the matching question, then the document would be retrieved, with an average of 60 other documents also being retrieved. This statement is incorrect, in that Swanson bases it on the view that there were only 6,000 documents in the index searched, whereas there were 18,000, so a search of the nature proposed might be expected to retrieve an average of 180 documents. However, using this method, Swanson finds a close correlation between the result of his 100 searches and the actual search results, and goes on to imply that the use of questions based on source documents will give predictable results. To find whether these results could be repeated, we carried out the same pro- cedure with the 114 questions and source documents of the W.R.U. test, as given in Appendices 2a and 2b of Ref. 3. This procedure gave 232 terms, of which 132 occurred only once. The result of this analysis was to show that 38 documents would have been retrieved by the use of a key term occurring not more than once, this repre- senting a recall ratio of 33%, as against the 85% recall achieved by the Cranfield facet index. On the other hand,, assuming that each key term occurring once in 114 documents would occur on an average of nine times in the whole collection, this method would have given a maximum precision ratio of 11% as against 16% achieved by Cranfield. Such a precision ratio of 11% could, of course, only be achieved by the hindsight of selecting the correct term and no other. For instance, Q. 107 'Effects of increasing molybdenum content by carburising steels' is counted as a success by the fact that 'carburising' occurs in both question and document title. However, 'molybdenum' meets the single-use requirement, so would have retrieved the source document for Q. 21, which would have been completely non-relevant. This effect would probably reduce the relevance ratio to less than 5%, but even so, the performance obtained by this method is vastly inferior to the performance obtained by the Cranfield index, and appears to make untenable the criticisms of Swanson. There would appear to be three possible reasons for the difference in results of the similar tests done by Swanson and at Cranfield. Firstly, the W R U. collection was narrower in subject coverage than the collection of the first Aslib-Cranfield pro- ject. For instance, one key word given by Swanson is 'Titanium'. Since only some 300 documents in the whole collection dealt with metallurgical subjects, such a term is clearly unlikely to occur more than once in a hundred documents, whereas in the W. R.U. count it occurred on eight occasions. (This is an aspect of the generality ratio discussed later) A second reason could be a significant difference in the quality of titles. Many documents in the first Aslib-Cranfield test were research reports, with titles which were fuller than usually occur in commercial journals, from which many documents were taken for the W.R.U. test.