CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Documents and Questions
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 29 -
Table 3.6 shows a considerable difference between the basic and supplementary
questions. 72.2% of all documents submitted to the basic questions were accepted as
relevant, but for the supplementary questions acceptance was 57.6[OCRerr]0. Such a dif-
ference might be expected in the case of the cited documents, since more of the
references in an author's paper are likely to be included as relevant to the basic
question, but the difference in acceptance shows the same proportional difference in
all the additional documents submitted as well, (see Table 3.6). A possible explana-
tion of this is the probably different attitude of the authors regarding the basic and
supplementary problems. In the case of the basic problem no one complete answer
would be available, and any document that shed some light on the problem, even if
only remotely, would be likely to be accepted. The supplementary problem had more
often been solved satisfactorily some time previously, and the author would there-
fore want to accept only those documents which dealt with the problem in a way that
met his particuiar requirements.
Individual relevance assessments, done by 182 different people, and with no
personal interaction with the project staff, cannot be entirely consistent. However
the assessments were made by experts in their subject, and represent the individual
and personal needs of the people concerned - the situation in which every information
retrieval system has to operate. The evidence appears to show that the assessments
were carefully done, although the task was sometimes difficult; as one author said:-
"Relevance assessment is not easy, but I have
done the best I can. In the case of this subject matter, the
literature is so extensive that the chances of a relative
newcomer picking out what mattered would be very poor;
much of what are, in this connection, significant details
have not been published anyway; even more important per-
haps is that only long association with such a subject,
both academically and experimentally, can enable one to
appreciate what is useful and to judge what is misleading,
unreliable or definitely faulty. "
The use of four relevance grades might appear to be too precise a distinction
to be able to make in practice, but quite a number of the authors indicated '½' grades,
i.e. (1-2), etc. For the testing stage we accepted these documents at the lower grade.
The definitions of the grades was a problem to one author:-
"Actually . . . none of your definitions (I), (2),
{3), (4), (5) fits my attitude toward the references. All
of the references were of considerable interest to me
because they showed me what people had done so far, how
recently, and by what methods, None was useful in
suggesting methods of tackling the problem. I already
knew all of the mathematical procedures that had been
used in the papers, and several that had not been employed.
To a large extent, it was interesting to find how little had
been done, and in some cases, how inadequately. "
Another author suggested that papers containing new or original answers to a
problem should have a separate grade, and several authors indicated that a given
document was a complete answer to their question, but an incorrect one. One new
idea for assessing relevance was suggested:-