CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Documents and Questions
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 31 -
A separate investigation, to extract similar information more thoroughly, might
be of value, particularly if the author supplied reasons why each paper was, or was
not, relevant. Comments on relevance itself, in the match between the questions
and the documents, is made later.
The authors' assessments of the additionally submitted documents might be
expected to have suffered a little in reliability, due to the time lag between the first
letter and the second, and due to the additional documents being supplied as abstracts
only. However some authors would be expected to have been aware of some of the
additional documents, and, having the full bibliographical details, could examine the
full text if they wanted to. Of the 201 questions for which additional documents were
submitted, 39 were returned with all the additional documents assessed as not rele-
vant, leaving 162 questions which had one or more of the documents relevant. Several
authors indicated a continuing interest in the problem of their own paper, and the
quick response to the second questionaire may indicate that the time lag was not a
problem.
The large and difficult task undertaken by the students must next be examined.
Some error would be expected of any job like this, and two pieces of evidence may
indicate the magnitude of the documents missed.
1. Of the 198 documents found only by Bibliographic Coupling, 119 were assessed
as relevant by the authors, (see Tables 3.4 and 3.5). There was only one graded
as relevance (1), and the majority were graded {3).
2. In cases where an author had given more than one question that we were using,
and also where we submitted additional relevant documents in relation to more than
one of the questions, all documents submitted were listed together on a sheet with
an indication given against each document of the question to which that document was
judged to be relevant (see Appendix 3.2). However, there were cases when an author
considered that a document which had been submitted in relation to one of the ques-
tions only was also relevant to anothdr of his questions. This occurred in 32 ques-
tions, and involved a total of 75 documents.
This last fact means that the figures in Table 3.4 referring to the additional
[OCRerr]tudent assessed papers include these 75 documents, and the corrected figure for
documents selected by the students is 842. Of these, 517 were accepted as relevant,
giving an acceptance rate of 61.4% as against the previous figure of 64.6%.
Together with the Bibliographic Coupling documents that were accepted, a total
of 194 relevant documents were missed by the students, which means that they found
517 of the 711 that were assessed as relevant, i.e. 73%. Reasons for failing to find
the known loss of 27% may be:-
1. The students' interpretation of the question was more strict than that of the author,
resulting in the students rejecting what the authors may have accepted.
2. The enormity of the task and inevitable occurrence of human error.
We may hypothesise that if the students' interpretation of the question had been
more liberal, a large number of possibly relevant documents would have been s[OCRerr]lec-
ted, resulting in a difficult task of assessment for the authors, and thereby perhaps