MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Automatic Assignment Indexing Techniques
chapter
Mary Elizabeth Stevens
National Bureau of Standards
Modifications to derivative indexing techniques that tend toward normalizations of
terminology and word usage, and increasingly sophisticated proposals for machine use
of syntactic, semantic, and contextual clues hold out the promise of transition to more
truly `1subject" indexing and to automatic assignment indexing systems.
4. AUTOMATIC ASSIGNMENT INDEXING TECHNIQUES
Answers to the question of whether indexing by machine is possible are actually
dependent in part on how the question of whether what can be achieved by machine is or
is not properly termed "indexing" is answered. If "indexing" is defined as being more
than the mere extraction of words from titles, abstracts, or text, then automatic
derivative indexing, even when augmented by various modifications, normalizations, and
editings, does not provide affirmative evidence. In the case of concept-oriented
definitions of indexing, the question becomes one of whether or not automatic assignment
indexing is possible. Experimental evidence suggesting that it is will be presented in this
section.
We should note first, however, that just as there are differences of opinion as to
what "indexing" means so there are similar differences, with respect to whether or not
it represents concepts rather than extracted words. There are also a number of conflict-
ing definitions of what is meant by "indexing" in contradistinction to "classifying". For
some, the latter difference is related to questions of the number of labels or surrogates
assigned to a single item to represent its subject contents, ranging from the assignment
of a single subject category in a classification scheme involving mutually exclusive
classes to the assignment of a number of terms or descriptor each standing for one of a
number of aspects of the subject. For our purposes, however, we shall regard both the
case of indexing with a number of descriptors and that of classifying to a single category
or subject heading as being within the province of automatic assignment indexing, re-
serving the term "automatic classification" for the case where the machine is used to
establish the classification or categorization scheme itself.
Actual experiments in automatic assignment indexing by Borko, Borko and Bernick,
Maron, Salton, Stevens and Urban, Swanson, and Williams will be discussed briefly
below. These discussions are generally in chronological order with respect to first
reporting of results, except that the Salton-Lesk-Storm work reflects a somewhat dif-
ferent principle of assignment from the methods using clue word approaches and it is
therefore described after these others have been discussed. Some of the similarities and
differences between the various methods are then indicated. A brief final subsection
covers related assignment indexing proposals for which experimental data is not available
or has not as yet been reported in the literature.
4.1 Swanson and Later Work at Thompson Ramo-Wooldridge
Research on fully automatic indexing as well as on full text searching and retrieval
at the Ramo-Wooldridge Corporation has been reported as being under way at least as
early as the spring of 1958. 1/ As described elsewhere in this report, experiments in
search and retrieval based upon full natural language text had used as test items short
articles in the field of nuclear physics. In additional experiments representing a
preliminary "clue word" approach to possibilities for automatic indexing procedures,
some of this same material was used.
1/
National Science Foundation's CR&D rept. no. 2, [430], p. 32.