CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Documents and Questions
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 22 -
,Q. 247 when received was worded "Can the hypersonic similarity results be applied
to the technique". By examination of the other supplementary and basic questions,
(Q. 13, Q. 12) it is seen that the technique under investigation is methods for pre-
dicting surface pressures ofanogive forebody at angle of attack, so question 247 was
rewritten to include this. When, as in the example, the meaning was quite obvious,
we inserted the missing words, and the re-submission of the question to the authors
in stage three revealed no disagreement with the amendments.
The next task was to find whether there were in the collection documents other
than those which had been in the list of citations, which were also relevant to any of
the questions. This was done by examining every document in relation to every ques-
tion, noting any new documents that were judged as possibly relevant, and then sub-
mitting these documents to the original authors for their final assessment of relevance.
The task was performed by students, with a knowledge of aerodynamics, who
were engaged in post-graduate study at the College of Aeronautics. Over 1,500 man-
hours of effort during the 1963 summer vacation were put in by five people. The job
involved in theory over half a million individual judgements, and was an extremely
onerous task. The questions were supplied on individual slips, with space given for
recording the file number of any document judged as relevant. Access was also
given to the original forms giving all the questions supplied by the author, the source
document, and the authors' relevance assessment of the cited papers. Details of
the document collection were supplied in the form of typed sheets, listing the docu-
ments in file order, and giving authors, titles and bibliographical details. Complete
copies of all the documents were readily available to the students.
The ultimate procedure adopted was to work on sections of the document col-
lection, ranging from 100 to 400 documents, depending on the number of people work-
ing at the same time. The questions were first sorted into broad subject groups,
and small batches of very similar questions were done together. Thus some of the
prominent features and subject areas of sections of the documents were soon com-
mitted to memory, to assist fairly rapid scanning of the document lists. The docu-
ment titles were examined first, and any documents that could remotely contain mater-
ial connected with the question were recorded on the question slip, so that at the end
of a 'scan, of the titles, the documents themselves could be examined. The students
were instructed to be quite liberal in their judgements, and to include documents that
they considered were only possibly relevant. An initial attempt was made to grade
their decisions for relevance, but this was found to be too difficult to do consistently,
and so was given up.
The task was tedious, particularly for people of intellectual capability, but 361
questions were finally completed. Those who carried out the task would not claim
to have found every possibly relevant document, since question interpretation would
not always agree completely with the authors' real need, and since human error was
inevitable. Some figures giving information on the number of relevant documents
missed by the students is given later in this chapter. Documents judged as relevant,
which really were not, did not cause any difficulty, since the original author of the
question was taken as the final arbiter of relevance. For 86 of the 361 questions, no
other documents were considered to be relevant; for the other 275 questions, there
was found at least one document judged as possibly relevant, with an average of 3.3
per question[OCRerr]