MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Indexes Compiled by Machine
chapter
Mary Elizabeth Stevens
National Bureau of Standards
A second method, implied in Fano's suggestions for the use of relative frequencies
of association between items found in the literature, is one of citingness, which groups
together items that cite one or more identical references. This method has been
developed by Kessler and his associates as the technique of "bibliographic coupling'1
(Kessler, [317] through [323]. The purpose here is to identify groupings of related
items where relatedness is defined in terms of the number of references shared by each
of the members of the group with some given test paper or with each other. It is noted
that where the citedness index and the reference list typically give the bibliographic
references themselves as the searching or retrieval tool, the bibliographic coupling
technique seeks rather to define groups of similar papers.!, A third method, and one
which may be combined with either of the other two, is to derive indexing terms for a
given paper from the overlay of indexing terms previously assigned to any papers which
it cites. Salton2/further suggests that:
.1... Citation indexes could be used to extend a given set of index terms by
starting with the terms attached to a given document or document set, and
adding to them the `related' terms obtained from new documents which cite
the original ones."
The suggested advantages of citation indexing include the claims that this tool does
not require trained indexers, [OCRerr]3/ that it is highly susceptible to mechanization (Garfield,
1955 [213], 1956 [212], 1957 [211]; Atherton, 1962 [25]: Becker and Hayes, 1963 [45]),
and that it may cost significantly less than subject indexing. ii A major advantage
claimed is responsiveness to user, rather than indexer, interests and view points.
Some of the representative claims with respect to this factor are as follows:
11
See Atherton and Yovich, 1962 [26], p. 3: "Kessler's method, however, does not
retrieve the references cited by a paper. Instead these references are examined
to determine the `bonds' between papers; e.g., if two papers share six references,
in common, they are said to have a `coupling strength' of six. By applying either
of two criteria of coupling, one can `filter out smaller groups of papers' related
to a given paper."
2/
3'
4/
Salton, 1962 [520], p. 111-8; see also Lesk, 1963 [356].
Atherton, 1962, [25], p.3.
See Atherton and Yovich, 1962 [26], pp. 3-4: "Garfield estimates cost of abstract-
ing and indexing 200, 000 articles in one year to be $3 million. He estimates the
cost of a citation index for these same articles (approximately 3 million citations)
to be $300,000." See also Doyle, 1963, [162], p.8: "The editing labor, the input
preparation cost, and the automatic processing time are all so small that it's very
likely citation indexing is destined for a great surge of popularity in the immediate
future.
Committee on Scientific Information, 1963 [135], pp. 55-56: "Because the index-
ing is based on the author's rather than on an indexer's estimate of what articles
are related to what other articles, citation indexes are particularly responsive to
the user's, rather than to the indexer's viewpoint."
30
5'