MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Indexes Compiled by Machine
chapter
Mary Elizabeth Stevens
National Bureau of Standards
Kessler himself and his associates have also conducted some experiments in
comparative evaluation of indexing aids derived from citation data on the one hand and
from conventional subject indexing on the other. The basis for evaluation was a total of
334 papers published in The Physical Review in 1958. The study involved detailed
comparison of the ways in which these papers fell into related groups according to the
"analytic subject index" used by the journal's editors and according to the method of
"bibliographic coupling". The essentials of the latter method are described as follows:
"a. A single item of reference used by two papers is called one unit of coupling
between them.
"b. A number of papers constitute a related group GA, if each member of the
group has at least one coupling unit to a given test paper[OCRerr]P0
"c. The coupling strength between P[OCRerr] and any member Of GA is measured by
the number of coupling units (n) between them ` 1/
For the 334 papers, 73 categories of the Analytic Subject Index (ASI) had been used.
For the bibliographic coupling method, each of the papers was in turn considered as the
test paper and groups were formed for any of the 333 other papers that shared one or
more citations with it. In general, it was concluded that there was good correlation
between the groupings of papers achieved by the two methods. It should be noted, how-
ever, that 44 papers fell into no groups at all on the basis of the bibliographic coupling
criterion. 2/
Salton and associates at the Harvard Computation Laboratory are also concerned
with the citation indexing principle as a possible basis for grouping similar documents.
They are also concerned with evaluation of results so obtained by comparison with
document groups obtained by subject indexing means. In the comparative experiments,
data were first compiled for a closed document set of 62 items as to similarities with
respect to both "citedness" and "citingness". The same items were manually indexed
and similarity coefficients between these items were derived from overlappings of
assigned index terms. When the two measures of similarity were compared with each
other and with document associations obtained by random assignments of "citations" and
"terms", the conclusions reached were as follows:
"The similarity coefficients obtained by comparing overlapping citations for a
sample document collection with overlapping, manually generated index terms
are much larger than those obtained by assuming a random assignment of
citations and terms to the documents; relatively large similarity coefficients
are generated for nearly all documents which exhibit at least a minimum
number of citations; little seems to be gained by using citation links of length
greater than two; for early documents, citedness furnishes a better indication
than the amount of citing, and vice versa for recent documents; for documents
which can both cite and be cited, equally good indications seem to be obtained
by comparing citing and cited documents." 3/
1/
2/
3/
Kessler, 1963 [32o[OCRerr], p.1, footnote.
Ibid, p. 5.
Salton, 1962 [szo[OCRerr], p. 111-42.
36