MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Indexes Generated by Machine-Automatic Derivative Indexing
chapter
Mary Elizabeth Stevens
National Bureau of Standards
in its appendices, which was not the principal concern of the author and may not even have
been considered significant by him. The claim that the author, who knows his own subject
best, has already indexed his paper best by his choice of words and emphasis in text, and
especially in his title, is pertinent only to that main subject to which he addresses himself,
not to the other potentially useful information which he may also disclose.
Other extrinsic factors affecting title adequacy and hence the effectiveness of title-
indexes are the size and the relative homogeneity or heterogeneity of the collection or set
of documents so indexed, the breadth or narrowness of the subject field or fields covered,
the time period covered and whether for one or many fields. Whether or not material in
more than one language is included is a special factor. These various factors interact in
various ways, usually with disadvantageous effects when even the most "nondescript"
human indexer (that is, one who accepts only words from the text itself) is replaced by
"a keypunch operator whose job it is to convert the keywords into machine-readable form,
and a machine whose job it is to assimilate machine-readable text and print out its per-
mutations with each significant word serving as an access point." 1/
The difficulties of subject scatter, synonymy, homography, redundancy, and the
like, however, will also occur in human indexing that relies heavily1on title only, which
is perhaps more frequently the case than is generally recognized, - just as much as for
machine-generated indexes involving the permutations of keywords in titles. Such dis-
advantages must therefore be balanced not only against the advantages of speed, timeliness,
having an index announcement tool personally available at low cost, and the like, but also
against the probability of obtaining as useful a tool within the limits of available human
indexing resources and justifiable costs. Cleverdon, for example, comments as follows:
"There are those who would say that this [KWIC] can in no way be called indexing,
and that the value of such indexing must be very much lower than that done by
intelligent trained human beings. This is a comfortable thought, but such small
evidence as is at present available makes it appear doubtful as to whether it is
entirely true. This is not to say that a human being cannot do a better job, but it
certainly appears likely that the cost of employing a human being to do it is of
doubtful economic value." 3/
1/
2I
Herner, 196z [266], p.4.
See, for example, Moss, 1962 [425], p.39: "I am convinced that a great many of
the UDC and other numbers which are provided on millions of cards in technical
libraries up and down the country, and which look so erudite, are, in fact, no more
than cards transliterating titles, with occasionally similar transliteration of a few
randomly chosen words from the abstracts as well. . . We are, in effect, already
largely using title indexing and complicating it unnecessarily by magic numbers."
See also Crane and Bernier, 1958 [144], p. 514: "Some indexes to periodicals,
particularly word indexes, are merely indexes of titles of papers or of abstracts."
Cleverdon, 1961 [125], pp.107-108.
62
3/