MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Automatic Indexing
chapter
Mary Elizabeth Stevens
National Bureau of Standards
A somewhat premature attempt was made to establish a subscription service for
KWIC indexes for a number of journals, for initial distribution beginning January 1,
1959.1/ Called PILOT (Permutation Indexed Literature Of Technology), the proposed
service was advertized as "a revolutionary new totally cross-referenced index ... and it
will be produced at the speed of light". Figure 1 is a reproduction of a part of the
brochure issued in 1958 by Permutation Indexing, Incorporated, Sol Grossman, President,
Los Angeles. While, perhaps unfortunately, the number of subscription orders received
was not adequate in terms of the ambitious coverage planned, work on permuted title
indexing elsewhere did lead rapidly to the publication of such indexes on a production
basis.
As of February 1964, there are more than 40 examples of KWIC and other variations
of permuted keyword indexing techniques in productive operation or available to the
searcher. KWIC-type techniques have also been extended to special one-time index com-
pilations and other applications, as in "automated content analysis" of verbal protocols of
psychiatric interviews and group leadership training sessions (Ford, 1963 [198]; Hart and
Bach, 1959 [256]; Jaffe 1962 [294] and 1958 [296]; Stone, et al, 1962 [575]).
The same period during which the ICSI was planned and held (1957-1958) was also
marked by the first issue of Current Research and Development in Scientific Documenta
tion by the National Science Foundation. In it and in subsequent issues, there were
reported other early efforts in machine-compiled indexes, in the construction and use of
special thesauri, and in indexing and retrieval experiments based on machine processing
of text. Thus, for example, punched card methods for compiling printed indexes and
announcement lists were under consideration at Bell Laboratories and at Esso Research
and Engineering. Special attention was being given to thesauri as early as July 1957 at
both Chemical Abstracts Service and the Cambridge Language Research Unit, and
at Ramo Wooldridge, "Research on the problems of fully automatic indexing and retrieval
based on raw text input to a general-purpose computer is under way.
Nevertheless, as of the present date, the question of the possibility of automatic
indexing in the sense of the substitution of machineable procedures for human intellectual
efforts normally required to identify, categorize, classify, index, select, and list
particular items in a collection of items is still moot. Opinions run the gamut from
extreme pessimism, "Mechanization of abstracting and indexing is rejected as impracti-
cal for the foreseeable future"3Jto enthusiastic optimism, "The conclusion that automatic
indexing and cataloging is superior to human indexing and cataloging is both provocative
and remarkable." 4/
Borko and Bernick claim that " . . . Raw data, i.e., unedited natural language text,
can be processed statistically so as to automatically assign index terms to each document
and to classify the document into a subject category; this has been demonstrated." On
the other hand, Farradane thinks that any form of mechanized processing in indexing
1/
2/
See Linder, 1960 [363], p. 99 and Figure 1.
[OCRerr]ational Science Foundation'sCR&D Reports No. 1, [430]pp.4,6; No.3 [430)
pp. 12, 19, 31
3/
Bar-Hillel, 1958 [33) , abstract.
4/ Swanson, 1962 [584) , p.468.
5/ Borko andBernick, 1963[78J p.28.
8