ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text

CRANV1P1 ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text Documents and Questions chapter Cyril Cleverdon Jack Mills Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 28 - 902 were relevance (3), and 427 were relevance (4). In terms of an average ques- tion, one can read off the figures as 11.1 submitted, 4.1 rejected, 7.0 accepted, and SO on. Examining the different origins of the documents in turn, the cited papers are seen to exceed all the other categories in size. From this group 4.5 documents per question were assessed as relevant; the additional groups of documents added another 2.5, making an average of seven relevant documents for each question. 63.4% of the cited documents submitted were accepted as relevant, and this seems satis- factory when it is remembered that all the references cited would not be relevant to all the questions given. In many cases some references are relevant to one of the questions only, and not relevant to the other questions at all. Table 3.5 shows that 14% of the relevant documents were graded as relevance (1), and some more details concerning this will be given when considering Table 3.7. The additional papers that the students judged as relevant totalled 917. These are not, of course, 917 unique documents, as one document might be relevant to sev- eral questions. The acceptance rate was 64.6%, and this may be taken as a clue to the success of this difficult task, but further details are given when Tables 3.6 and 3.7 are examined, and when comment is made on the success of the student[OCRerr] judge- ments. Of the 592 accepted, only 12 (2%) were graded at relevance (1), so in most cases the authors considered these additional papers submitted were not as relevant as the cited ones about which they already knew. The additional bibliographic coupling documents, submitted because they had seven or more of their references in common with the cited papers of relevance (1), (2) or (3), were only those which had not already been selected by the students as possibly relevant (see chapter 7). Table 3.8 shows that of the 312 documents retrieved by biblio- graphic coupling, 87 were cited papers and 1 2 were base documents; of the remainder only 15[OCRerr]had been selected by the students as possibly relevant, leaving a balance of 198 further documents to be submitted to the authors. The acceptance rate of these was 60.1%, a little lower than the acceptance of the students' documents, and only a single document of the 110 accepted was graded relevance (1). In assessing all the additional relevant documents submitted, the authors did not know which had been selected by the students and which were retrieved by biblio- graphic coupling. The small variations in the acceptance rate (see final column of Table 3.4) by the authors for the different categories are so slight that they are not statistically significant. However there is significant difference in the proportion of documents put into the various relevance grades. From Table 3.5, it can be seen that with the cited papers41% were included in grades (1) and (2); of the additional relevant papers found by the students only 18% were put in those grades and 15% of these revealed by citation indexing. The fact that so many of these additional references were placed in relevance grades (3) and (4), may be due to the fact that the authors did in fact know of the existence of many of those additional papers, but had selected the cited ones as being the most relevant to include in this paper. So far the figures have been derived from the total set of 279 questions, but, as previously stated the questions fall into two groups. The authors had been asked to give the one basic question that gave rise to their work, and then to give any sup- plementary questions that came up during the progress of the work. Of the 279 ques- tions, 118 are basic, and 161 supplementary. In order to discover whether the authors' assessments of their basic questions were in any way different to the supplementary questions, the same figures from Tables 3.4 and 3.5 are set out again in Tables 3.6 and 3.7, now divided into the two categories of questions. *The 15 documents which were both selected by the students and retrieved by biblio- graphic coupling might be expected to have a higher [OCRerr]e'[OCRerr]nt[OCRerr]no[OCRerr] rn[OCRerr] h[OCRerr], +h,[OCRerr] [OCRerr],,+[OCRerr]o [OCRerr],.+