ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text

CRANV1P1 ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text Documents and Questions chapter Cyril Cleverdon Jack Mills Michael Keen Cranfield An investigation supported by a grant to Aslib by the National Science Foundation. Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government. - 24 - Most authors included the weighting of the questions in their reply, over half of the questions had some alternative terms added, and 28 of the questions were sub- mitted in rephrased form. (See Appendix 3B J. A summary of the position regarding the questions is as follows:- 1. Total of questions received ............ 641 2. Questions discarded for various reasons ........ 280 3. Questions matched against complete document collection for relevance { (1} - {2} } .............. 361 4. Questions having no additional relevant references .... 78 5. Questions resubmitted to authors for relevance decisions.. 283 6. Questions returned by authors from stage (5) ...... 201 7. Questions available for test { (4) + {6) } ........ 279 The relevance assessments The basic data on the authors' relevance assessments is given in Tables 3.4, 3.5, 3.6 and 3.7. These tables highlight various aspects of the relevance assess- ments, and the figures given are taken from the 279 usable questions obtained. In each table, the documents that were submitted to the authors are split into three categories: - 1. Those cited in the author's own original paper; 2. Those the students found and judged as being relevant; 3. Those retrieved by bibliographic coupling at a strength of 7 plus, and which were additional to the two categories above. Each table also gives a figure for the total of all categories, the four divisions being shown as the left hand parameter in each table. The relevance assessments made are given in the body of the tables, these being split into several categories:- 1° 2. 3. 4. submitted. Documents submitted (Tables 3.4 and 3.6) Documents assessed as relevant, i.e. accepted:- (a} Totals (Tables 3.4, 3.5, 3.8 and 3.7) (b) Details of the four grades of Relevance (Tables 3.5 and 3.7) Documents assessed as not relevant, i.e. rejected (Tables 3.4 and 3.6) Total documents assessed [OCRerr]:[OCRerr]s relevant expressed as a percentage of documents (Tables 3.4 and 3.6). The figures given are in two forms in each table:- 1. Grand totals of documents, resulting from the whole get of questions involved. 2. Figures for one average question, calculated by the arithmetic mean. These averages are correct to one decimal place, but in a few cases a slight adjustment has been made to preserve the correct totals. Tables 3.4 and 3.5 giving the figures for the whole set of 279 questions will be examined first. The bottom section of Table 3.4 shows that 3,087 documents were submitted to the authors of which 1,126 were rejected as not relevant, and 1,961 (i.e. 63.5%) were accepted as relevant. Table 3.5 gives a breakdown of the 1,961 documents accepted, showing that 171 were graded relevance (1}, 461 were relevance {2},