MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Indexes Generated by Machine-Automatic Derivative Indexing
chapter
Mary Elizabeth Stevens
National Bureau of Standards
Such obvious positional clues as occurrences of words in titles, chapter or section
headings, figure captions, have already been mentioned. To these can be added first and
last sentences of paragraphs, 1/ or of first and last paragraphs as such. 2/ Wyllys
observes that other criteria which are detectable in the text by straightforward machine
procedures can be based on such features as italicization, capitalization, or punctuation.
He notes, however, that such "editorial" criteria vary from journal to journal so that
their usefulness would need to be related to the particular practices of individual
journals. 3/
Somewhat more difficult for machine implementation, but certainly feasible in the
present state of the programming art, is the use of specific semantic or syntactic clues.
Here again, Luhn, Baxendale, and Edmundson and Wyllys all anticipate their critics and
later investigators. Luhn recognized the fact that in at least some applications the
characterization of documents by isolated words alone would fail to provide an effective
degree of discrimination. He, therefore, suggested operations to establish word
relationships, whether based on co-occurrences or combinations of specific parts of
speech. 4/ Baxendale clearly uses both syntactic and semantic clues, detectable by
built-in table lookups.
Representative suggestions by Edmundson or Wyllys or both as co-authors include
the following:
We have in mind a glossary or dictionary of perhaps one to two thousand
words that act either as cue words which signal the importance of a sentence
or as stigma words that signal the insignificance of a sentence for purposes of
abstracting." 5/
1/
See, for example, Wyllys, 1963 [653], p.27: "One of the first published studies
in automatic document-content analysis, that of Miss Phyllis Baxendale, brought
out the importance of the first and last sentences in a paragraph as bearers of
a good deal of the content of the paragraph." See also Marthaler, 1863 [399],
p.25.
2/
3/
4'
5/
Compare Swanson, 1963 [580], p. 1: ". . Some evidence exists to show that for
short homogeneous articles title and first paragraph are nearly as good as full
text.
Wyllys, 1963[653], p.28.
Luhn, 1959 [384], p. 5.
Edmundson,1962 [178], p. 11.
85