MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Automatic Indexing
chapter
Mary Elizabeth Stevens
National Bureau of Standards
1.1 Definitions and Background
The noun "index" has as its most general meaning "something used or serving to
point out, a sign, token, or indication", (American College Dictionary) or "that which
shows, indicates, manifests, or discloses; a token or indication" (Webster's International
Dictionary, Znd Edition, unabridged). More specifically, an index is "a pointer or key
which directs the searcher to recorded information'.'1[OCRerr] The terms "index" and "indexing"
have been used in the fields of library science and documentation with reference to the fact
that the selection of information pertinent to a particular problem or interest, from all the
previously recorded information available, involves problems of decision[OCRerr]naking based
on less than the full content or text of each of the records being searched.
Short of complete scanning of all the possibly relevant material, it is necessary to
select or "distill" condensed representations or surrogates 2/ for each item. These
surrogates are intended to direct the searcher to the most probably pertinent items in a
collection. The operations known as "indexing" thus involve:
(1) Choosing clues that will serve to identify, for purposes of later retrieval, a
particular book, document, or other recorded item, and
(2) Either marking on the item itself or recording as a separate item-surrogate
the tags, labels, or codes representing these clues.
The second of these two steps can be purely clerical in nature, but the first has been,
to date, primarily the result of human intellectual efforts in subject content analysis.
Well-known inadequacies of human indexing operations include both those stemming
from man himself and those which result from the volume and the character of the
materials with which he deals. On the human side, there are fundamental questions of
perception, comprehension and judgment, as well as those of inter-indexer and even intra
indexer consistency. In addition, the indexer is asked to guess in advance what others
will ask for, understand, and find relevant on future search. He is even asked, in effect,
to anticipate the language of future inquiries. Thus, a somewhat facetious definition of the
noun "index" has a considerable sting of truth: "A system of analyzing information in
which the method used to choose categories is carefully hidden from the user. An attempt
to outguess the future." 3/
The nature of the material to be indexed, especiafly in the area of scientific informa-
tion, raises a number of crucial problems. The still increasing spate of production of
technical literature and reports poses not only the problems of sheer volume in terms of
11
2/
3/
Crane and Bernier, 1958 [144], p.513.
(Note: Full citations of references are given in the bibliography by author and by
numerical order of the figures in brackets.
See, for example, RE. Wyllys, 1962 [651], for discussion of the two-fold purposes
of condensed representations: to serve a search-tool function on the one hand and
a content-revealing one on the other.
Vanby, 1963 [622], p. 143.
2