MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Indexes Generated by Machine-Automatic Derivative Indexing
chapter
Mary Elizabeth Stevens
National Bureau of Standards
They indicate, for example, that the preposition 1from1' serves as a key for the treatment
of two nouns connected by it. 1/ Swanson, describing research project progress at Ramo
Wooldridge as of 1960, reported to the National Symposium on Machine Translation with
respect to multiple meaning problems as follows:
`1We are also investigating the possibility of discovering semantic attributes of
words based upon certain automatically recognizable statistical features of the
context. Our initial endeavor in this direction has been to attempt to discover
a classification system for nouns based upon their frequency spectrum of cate-
gories of modifying adjectives, these categories being automatically recognizable.
3.3 Derivative Indexing From Automatic Abstracting Techniques
While Baxendale's work has had certain points in common with automatic abstracting
or extracting processes, particularly in the use of word frequency statistics and the
consideration of possibilities for first selecting topic sentences, her major interests in
this area have been in automatic indexing as such, rather than in machine selection of
sentences from text to serve as an automatic extract or derivative abstract of the
document. Much of the machine processing to date of full text for documentation
purposes, however, has had the latter goal as the principal research objective.
As we have previously noted, the subject of automatic abstracting or auto-
condensation is not in itself a primary concern of this survey. Nevertheless, the signifi-
cant words occurring in the abstract of a document, whether generated by man or by
machine, are obviously good candidates for indexing terms. Moreover, it has been
strongly suggested that the questions of using positional, editorial, and syntactical clues
in order to improve automatic indexing techniques will profit by research that is being
done in both automatic extracting procedures and in other types of linguistic data pro-
cessing based upon full text. 3/
3.3.1 Auto-Condensation and Auto-Fncoding Techniques of H. P. Luhn
Although Luhn's work in the field of documentation aided by machine has had its best
known and most popular acceptance with respect to the KWIC index proper, even more
provocative possibilities lie in the development of some of the auto-condensation and auto-
encoding techniques which he also proposed, especially for full text processing. In this
area, although he himself has also suggested a variety of possible improvements and
refinements, the actual experimental work done by him and by his associates has mostly
been done on the basis of word frequency statistics.
1/
ZI
3/
Langleben and Shumilina, 1962 [347[OCRerr], p.109.
Swanson, 1961 [585], pp. 391-392.
See, for example, Wyllys, 1963 [653], p.7.
75