CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Documents and Questions
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 34 -
source document', in Cranfield I and the W.R.U. Tests. Swanson, in a sample
taken from the first project, demonstrated that this biased relationship was shown
by an unusually close match between the words of the question and the titles of the
relevant documents. In his paper, Swanson gives the result of an analysis of the
terms used in a set of 100 questions and the titles of their accompanying source
documents. This was done by the Cranfield group and discussed at some length
on pages 27-32 of Ref. 2, although Swanson does not comment on this work.
instead he prepared an admittedly more exact method, which would give, according
to his view, retrieval of the source document and the number of irrelevant documents
also retrieved would be small. To do this, he took the 100 document titles given in
Appendix 4B. and made a list of all the terms which did not occur more than once.
From this he argued that, if such a term also occurs in the matching question, then
the document would be retrieved, with an average of 60 other documents also being
retrieved. This statement is incorrect, in that Swanson bases it on the view that
there were only 6,000 documents in the index searched, whereas there were 18,000,
so a search of the nature proposed might be expected to retrieve an average of 180
documents.
However, using this method, Swanson finds a close correlation between the
result of his 100 searches and the actual search results, and goes on to imply that
the use of questions based on source documents will give predictable results.
To find whether these results could be repeated, we carried out the same pro-
cedure with the 114 questions and source documents of the W.R.U. test, as given in
Appendices 2a and 2b of Ref. 3. This procedure gave 232 terms, of which 132 occurred
only once. The result of this analysis was to show that 38 documents would have
been retrieved by the use of a key term occurring not more than once, this repre-
senting a recall ratio of 33%, as against the 85% recall achieved by the Cranfield
facet index. On the other hand,, assuming that each key term occurring once in 114
documents would occur on an average of nine times in the whole collection, this
method would have given a maximum precision ratio of 11% as against 16% achieved
by Cranfield. Such a precision ratio of 11% could, of course, only be achieved by
the hindsight of selecting the correct term and no other. For instance, Q. 107
'Effects of increasing molybdenum content by carburising steels' is counted as a
success by the fact that 'carburising' occurs in both question and document title.
However, 'molybdenum' meets the single-use requirement, so would have retrieved
the source document for Q. 21, which would have been completely non-relevant. This
effect would probably reduce the relevance ratio to less than 5%, but even so, the
performance obtained by this method is vastly inferior to the performance obtained
by the Cranfield index, and appears to make untenable the criticisms of Swanson.
There would appear to be three possible reasons for the difference in results
of the similar tests done by Swanson and at Cranfield. Firstly, the W R U. collection
was narrower in subject coverage than the collection of the first Aslib-Cranfield pro-
ject. For instance, one key word given by Swanson is 'Titanium'. Since only some
300 documents in the whole collection dealt with metallurgical subjects, such a term
is clearly unlikely to occur more than once in a hundred documents, whereas in the
W. R.U. count it occurred on eight occasions. (This is an aspect of the generality
ratio discussed later)
A second reason could be a significant difference in the quality of titles. Many
documents in the first Aslib-Cranfield test were research reports, with titles which
were fuller than usually occur in commercial journals, from which many documents
were taken for the W.R.U. test.