ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text

CRANV1P1 ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text Documents and Questions chapter Cyril Cleverdon Jack Mills Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 22 - ,Q. 247 when received was worded "Can the hypersonic similarity results be applied to the technique". By examination of the other supplementary and basic questions, (Q. 13, Q. 12) it is seen that the technique under investigation is methods for pre- dicting surface pressures ofanogive forebody at angle of attack, so question 247 was rewritten to include this. When, as in the example, the meaning was quite obvious, we inserted the missing words, and the re-submission of the question to the authors in stage three revealed no disagreement with the amendments. The next task was to find whether there were in the collection documents other than those which had been in the list of citations, which were also relevant to any of the questions. This was done by examining every document in relation to every ques- tion, noting any new documents that were judged as possibly relevant, and then sub- mitting these documents to the original authors for their final assessment of relevance. The task was performed by students, with a knowledge of aerodynamics, who were engaged in post-graduate study at the College of Aeronautics. Over 1,500 man- hours of effort during the 1963 summer vacation were put in by five people. The job involved in theory over half a million individual judgements, and was an extremely onerous task. The questions were supplied on individual slips, with space given for recording the file number of any document judged as relevant. Access was also given to the original forms giving all the questions supplied by the author, the source document, and the authors' relevance assessment of the cited papers. Details of the document collection were supplied in the form of typed sheets, listing the docu- ments in file order, and giving authors, titles and bibliographical details. Complete copies of all the documents were readily available to the students. The ultimate procedure adopted was to work on sections of the document col- lection, ranging from 100 to 400 documents, depending on the number of people work- ing at the same time. The questions were first sorted into broad subject groups, and small batches of very similar questions were done together. Thus some of the prominent features and subject areas of sections of the documents were soon com- mitted to memory, to assist fairly rapid scanning of the document lists. The docu- ment titles were examined first, and any documents that could remotely contain mater- ial connected with the question were recorded on the question slip, so that at the end of a 'scan, of the titles, the documents themselves could be examined. The students were instructed to be quite liberal in their judgements, and to include documents that they considered were only possibly relevant. An initial attempt was made to grade their decisions for relevance, but this was found to be too difficult to do consistently, and so was given up. The task was tedious, particularly for people of intellectual capability, but 361 questions were finally completed. Those who carried out the task would not claim to have found every possibly relevant document, since question interpretation would not always agree completely with the authors' real need, and since human error was inevitable. Some figures giving information on the number of relevant documents missed by the students is given later in this chapter. Documents judged as relevant, which really were not, did not cause any difficulty, since the original author of the question was taken as the final arbiter of relevance. For 86 of the 361 questions, no other documents were considered to be relevant; for the other 275 questions, there was found at least one document judged as possibly relevant, with an average of 3.3 per question[OCRerr]