MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Indexes Generated by Machine-Automatic Derivative Indexing
chapter
Mary Elizabeth Stevens
National Bureau of Standards
3.3. ZFrequencies of Word n-tuples - Oswald and Others
The first alternative to the basic Luhn word frequency approach in automatic ab-
stracting techniques to be actively explored was apparently that of Oswald and his
associates. (Oswald et al, 1959 [459]; Edmundson et al, 1959 [180]). Like Baxendale,
Oswald was interested in word pairs and word groups, particularly compound-noun and
adjective-noun compositions, as more revelatory of meaning than single words. Unlike
Baxendale, however, he was interested in the word group itself as selection criterion,
whereas she had used word group or phrase clues for the selection of (usually) single
indexing terms. Differences between their two approaches, both representing very early
efforts in the field, are summarized by Edmondson and Wyllys as follows:
"Oswald's experiment in automatic abstracting differs from Luhn's and Baxendale's
techniques in that it combines the notion of significance as a function of word
frequency and the notion of significance as a function of word groupings, by employing
juxtapositions of significant words as the basic unit for measuring the importance
of a sentence...
"It may further be observed that Baxendale's exhibited indexes are made up of single
words rather than word groups, in spite of the strong case she makes for using
groups...
"Baxendale's work is concerned solely with the automatic construction of indexes;
she does not extend her treatment of word significance into the area of automatic
abstracting." 1/
Oswald's "multiterms", however, were intended to overcome, in the areas of both
automatic indexing and automatic abstracting, at least some of the difficulty that concepts
are often expressed in compound nouns, word pairs, and longer groups of words consist -
mg of n-tuples of substantive words or of phrases. The result of considering both word
frequency and word-group frequency is that in Oswald's selection[OCRerr]groups it is usually the
case that only one word of the group has an individually high frequency but the co-
occurrence feature heightens the significance of the relatively lower frequency words
with which it appears. Thus, for automatic indexing, Oswald proposed significant word
groups as indexing terms, and his criteria for selection of sentences to be included in
machine[OCRerr]generated extracts are similarly based on the number of significant groups in
the sentences chosen.
Other investigators who have stressed the importance of word pairs and longer groups
as necessary to reflect concepts include Bar-Hillel (1959 [33]), Black(1963 [64]), Clark
(1960 [1z3]), Doyle (1959 [165]), and Salton (1963 [519]). Doyle says succinctly that
"when a phrase, or some other aggregation of words, stands for a single idea, its
frequency in a document ought to interest us more than the frequencies of its component
words." 2/ Salton considers it desirable to use word groups rather than individual words
1/
2/
Edmundson and Wyllys, 1961 [181], pp.231-232.
Doyle, 1959 [165], p. 11.
79