MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Indexes Generated by Machine-Automatic Derivative Indexing
chapter
Mary Elizabeth Stevens
National Bureau of Standards
line, sentence number and other reference identifications. After re-processing against
a stop list of common words, all other words in the edited text are selected as
candidate index entries, these are then sorted into alphabetical order with subsequent
printout giving each word occurrence followed by the entire sentence which contained it
and the page and other location identifications. This computer output is then post-edited
manually not only to eliminate trivial entries but also to normalize terms and phrases
used.
3.2. 3Modified Derivative Indexing - Baxendale's Experiments
As has been previously noted in the introduction to this report, the name of Phyllis
Baxendale together with that of H. P. Luhn is generally accorded credit for pioneering
efforts in the entire area of automatic indexing.Baxendale in particular is generally
credited with the first actual experiments in modified derivative indexing. In investiga-
tion beginning in the late 1950's, she has explored not only statistical approaches to
automatic selection of index terms (based for example on word frequencies) but also the
use of word pairs, word groups, contextual associations, and in particular the subject-
indicating clues of prepositional phrases (Baxendale, 1958 [41], 1961 [40], 1962 [42];
Becker, 1960[44]; Edmundson and Wyllys, 1961 [181]).
Baxendale began by considering the patterns of scanning that humans typically use
to select `1topic" sentences, phrases and words, and she then proceeded to simulate by
computer program the selection of phrases consisting primarily of nouns and modifiers.
In her first experiments, (1958 [41]) she used two methods of automatic selection. In
the first procedure, words serving the grammatical functions of pronoun, article,
auxiliary verb, conjunction and the like, were deleted by stop list lookup. Frequency
count statistics were then derived for the remaining words. In her second procedure,
the computer was programmed to select prepositional phrases from text and to use the
four words succeeding the preposition as index entries unless an additional preposition or
a punctuation mark is first encountered.
In later experiments, Baxendale has explored possible grammatical models "which
would select all and only nouns or adjective-noun combinations". 1/ Taking as an initial
corpus a sample of document titles, rules were devised to reject for human analysis titles
with question-marks and the like, to eliminate numeric information and single symbols,
and to segment the title into its component clauses and phrases by the detection of
commas, periods, and similar clues. By list lookup, certain words are identified as
capable of serving the syntactic functions of being quantifiers, prepositions, or clause
introducers. Special subscripts are then assigned to these words and the subscripts are
examined by machine to provide further segmentation; to delete quantifiers, auxiliary
verbs, or words ending in "ed" or "mg" and preceded by an auxiliary verb, and to deter-
mine relationship functions between the remaining, presumably substantive, words.
Still other work by Baxendale has been directed toward the development of frequency
of co-occurrence or textual association of candidate indexing terms. She reports as
follows:
1/
Baxendale, 1961 [40], p. 209.
73