MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Indexes Generated by Machine-Automatic Derivative Indexing
chapter
Mary Elizabeth Stevens
National Bureau of Standards
(2) computation of functions proportional to the number of initially occurring nouns for
each sentence, and (3) the preparation of a normalized graph for initial noun occurrences
by plotting the functional values against each sentence in the text.!' Sentence selection
can then proceed by processes to detect "peaks" on the graph, using a relative criterion
or weighting function to minimize the effect of high first-noun counts in the beginning
sentences of a paper.
Trials were made with a number of different weighting formulas, and the best of these
involved the obtaining of moving averages of first-noun counts over several adjacent
sentences. A particular formula covering a span of seven sentences gave results that
appear to emphasize contextual effects and to reduce the effects of a particular single
sentence with a large number of new nouns, such as a listing of proper names. The
resulting abstracts are quite lengthy (e.g. , comprising 20 percent or more of the original
text), and contain some relatively uninformative sentences. The investigators think that
the results with respect to satisfactory abstracting are inconclusive but provocative. They
also conclude that the possibilities for indexing are more immediately promising: "Most
key definitions are retained in the successful summaries, and the vocabulary reflects the
topics covered in the texts." 2/
Qther examples of mixed-system experimentation, especially involving the use of
syntactic and semantic considerations, include the work at the General Electric Computer
Department under Spangler, and work by Jacobson and Plath. In the Phoenix laboratories
of General Electric, a KWIC type indexing program can be applied both to titles and to
running text and a contemplated extension is intended to "generate indexes by means of
word analysis, taking into consideration syntactic and semantic aspects of text lines". 3/
Jacobson describes rules for machine determinations of same-meaning occurrences of
words which may be homographic and for selection of descriptors for indexing simple
paragraphs by choosing words occurring at least twice with a high probability of having the
same meaning. 4/ Plath reports:
"Although sentences occur in which the key term or phrase lies buried
deep down in the structure, preliminary observations indicate that there
are many others in which the semantic hierarchy closely parallels that
of the syntactic structure. This suggests that more sensitive vocabulary
statistics for purposes of automatic abstracting may be obtainable by
considering only words occurring in positions above a predermined cut-
off level in the sentence structure. Alternatively, one might count
occurrences of words on each level, and then multiply by a fixed
"5/
weighting factor in each instance before taking the overall totals. -
1/
2/
3/
4/
5/
Lesk and Storm, 1962 [358], pp. 1-2, I-[OCRerr] ff.
Ibid, p. 1-31.
National Science Foundation's CR&D Report No. 11, [430], p. 21.
Jacobson, 1963 [292], p. 191-192.
Plath, 1962 [474], p. 190.
88