CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Documents and Questions
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 23 -
When submitting these documents to the authors, it was decided to add some
extra documents which had been suggested as a result of a test of the questions by
the technique known as bibliographic coupling. A description of the processing of
the cited papers in the documents of the collection, which resulted in a citation
index and bibliographic coupling groups, is given in chapter 7. In the theory of
bibliographic coupling, as worked out by Dr. M. M. Kessler, (Ref. 20) it is shown
that, as the coupling strength increases, so also does the probability of the document
being relevant to the question. It was therefore decided to include all documents
retrieved by bibliographic coupling at a coupling strength of 7 or more (i. e. docu-
ments that had seven or more references in common with one of the author's cited
relevant papers of grade (1), (2} or (3)). Of the 213 documents produced in this way,
only the unexpectedly small number of 15 had already been assessed as possibly
relevant by the students. The balance of 198 were submitted, along with the student
assessed documents, in the second communication to the authors. This time the
authors were requested to do three things; for reasons considered later..
1. To make a relevance assessment of the new documents submitted, in
relation to their search questions, using the same relevance scale as before.
2. To examine the selected questions (which they themselves had originally
asked), and to indicate the relative importance of each term or concept in the ques-
tion by marking with a 'weight' from the following scale:-
(i) A paper that did not cover this term would be of no use.
(ii) It is desirable that this term should be covered by the document.
(iii) This is a term which is not absolutely essential to the enquiry.
3. To list any alternative terms or concepts that might be used in a search
programme for the questions and, if necessary, to include a completely rephrased
version of the question.
A xerox copy of the questions as he originally wrote them was sent to each
author, together with a list of the new documents submitted, giving authors, titles
and bibliographical references. Against each such document submitted was indicated
the question to which the document was thought to be relevant, and to assist the
relevance assessment a xerox copy of each document abstract was included. Each
of the questions was re-submitted on a separate sheet, with space provided for
alternative words to be added, either against each single term, or the concepts of
the questions. Examples of the above are given in Appendix 3B.
Most authors received a total of at least eleven sheets for examination, which
together with the abstracts of the documents submitted, made a somewhat daunting
package. In spite of this, 144 out of 182 authors (79.1%) returned completed forms,
with yet others being unable to help and some having changed addresses as before.
Our main concern was to obtain the relevance assessments, which were needed for
283 of the questions and the authors' responses provided assessments for 201 of
these. 78 questions had not been resubmitted to the authors because no possible
relevant documents had been noted; adding these to the 201 questions where the
relevance assessments had been completed meant that there was a total of 279
questions which could be used. This fell slightly short of the 300 questions
originally planned; as will be considered later, we were by this time beginning to
suspect that the test would provide more data than could be handled or would be
required, and therefore no effort was made to bring the total number of questions
back to 300.