MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Automatic Indexing chapter Mary Elizabeth Stevens National Bureau of Standards 1.1 Definitions and Background The noun "index" has as its most general meaning "something used or serving to point out, a sign, token, or indication", (American College Dictionary) or "that which shows, indicates, manifests, or discloses; a token or indication" (Webster's International Dictionary, Znd Edition, unabridged). More specifically, an index is "a pointer or key which directs the searcher to recorded information'.'1[OCRerr] The terms "index" and "indexing" have been used in the fields of library science and documentation with reference to the fact that the selection of information pertinent to a particular problem or interest, from all the previously recorded information available, involves problems of decision[OCRerr]naking based on less than the full content or text of each of the records being searched. Short of complete scanning of all the possibly relevant material, it is necessary to select or "distill" condensed representations or surrogates 2/ for each item. These surrogates are intended to direct the searcher to the most probably pertinent items in a collection. The operations known as "indexing" thus involve: (1) Choosing clues that will serve to identify, for purposes of later retrieval, a particular book, document, or other recorded item, and (2) Either marking on the item itself or recording as a separate item-surrogate the tags, labels, or codes representing these clues. The second of these two steps can be purely clerical in nature, but the first has been, to date, primarily the result of human intellectual efforts in subject content analysis. Well-known inadequacies of human indexing operations include both those stemming from man himself and those which result from the volume and the character of the materials with which he deals. On the human side, there are fundamental questions of perception, comprehension and judgment, as well as those of inter-indexer and even intra indexer consistency. In addition, the indexer is asked to guess in advance what others will ask for, understand, and find relevant on future search. He is even asked, in effect, to anticipate the language of future inquiries. Thus, a somewhat facetious definition of the noun "index" has a considerable sting of truth: "A system of analyzing information in which the method used to choose categories is carefully hidden from the user. An attempt to outguess the future." 3/ The nature of the material to be indexed, especiafly in the area of scientific informa- tion, raises a number of crucial problems. The still increasing spate of production of technical literature and reports poses not only the problems of sheer volume in terms of 11 2/ 3/ Crane and Bernier, 1958 [144], p.513. (Note: Full citations of references are given in the bibliography by author and by numerical order of the figures in brackets. See, for example, RE. Wyllys, 1962 [651], for discussion of the two-fold purposes of condensed representations: to serve a search-tool function on the one hand and a content-revealing one on the other. Vanby, 1963 [622], p. 143. 2