CRANV1P1
ASLIB Cranfield Research Project: Factors Determining the Performance of Indexing Systems: VOLUME 1. Design, Part 1. Text
Additional Tests
chapter
Cyril Cleverdon
Jack Mills
Michael Keen
Cranfield
An investigation supported by a grant to Aslib by the National Science Foundation.
Use, reproduction, or publication, in whole or in part, is permitted for any purpose of the United States Government.
- 107 -
on both the indexing of these documents and also their abstracts. Subsequently
the decision was taken to extend this work so as to cover the whole of the Cranfield
collection.
Citation indexing
It was in 1961 that the first major grant was given for a citation index (ref. 32),
and the following year we were asked for our views on how a citation index could be
evaluated. Citation indexing is basically a method of forming classes of documents
which are all related through a common reference to a base document. There will,
of course, be many occasions when the class consists of only a single entry; however,
in the more numerous cases where the class consists of two or more documents,
then citation indexing can be considered equivalent to bibliographic coupling at a
strength of one. Bibliographic Coupling has been developed by Dr. M. Kessler at
the Massachusetts Institute of Technology (ref. 20), and in relation to citation indexing,
can be considered as a precision device, since it progressively narrows the class of
documents as the demand for common references increases in number. Citation
indexing and bibliographic coupling could therefore be tested in the same way as any
other device; it was, however, necessary to prepare an index for this purpose.
The first stage was to prepare xerox copies of the citations in the 1400 documents
in the test collection; against each citation was put the code number for the citing
documents, after which each citation was cut up so as to appear on a separate slip
of paper. This resulted in some 20,000 slips of many various sizes, which had to
be sorted into author alphabetical order. This being done, the slips were pasted
onto sheets of paper; where two or more slips related to the same cited document,
only one example was pasted in; the references to the additional citing documents
were entered alongside. This can be seen in Fig. 7.1, which covers a series of
references to papers by H.J. Allen, in particular a paper written with A.J. Eggers
entitled 'A study of the motion and aerodynamic heating of missiles entering the earth's
atmosphere at high supersonic speeds. ' (NACA TN 4047), This is shown to have
been cited by thirteen papers in the test collection.
This procedure resulted in a normal citation index; to obtain the index for
bibliographic coupling required three further stages. First, each cited reference
having two or more citations was givena code number, the paper by Allen and Eggers
beingA25, and a separate card was prepared for each cited reference. On this
reference card was written the code for the cited document (i.e. A25) and then, in
numerical order the codes for the citing documents. Fig. 7.2 illustrates the
reference card prepared in connection with the paper by Allen and Eggers shown in
Fig. 7.1.
The reference cards were sorted into numerical order depending on the lowest
number on each card. Since these numbers represented the codes for the citing
documents, they ranged from 1001-2400. Each card was then taken in turn, and the
information from all reference cards having the same starting number was transferred
to a master card. As an example, the master card shown in Fig. 7.3 illustrates the
position with regard to document 1067, this number being poated in the top left hand
corner. In the column headings are entered the code numbers for the documents which
have been cited by document 1067, this information being obtained from the reference
cards such as Fig. 7.2, and these being, in this particular case, A25, W32, A23, E23,
F90, and 024. In the first column of the master card are entered the document numbers
of all other citing papers, this information being again obtained from the reference
cards. A tick is put against each number in the column under the appropriate heading