MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Indexes Generated by Machine-Automatic Derivative Indexing chapter Mary Elizabeth Stevens National Bureau of Standards 3.3. ZFrequencies of Word n-tuples - Oswald and Others The first alternative to the basic Luhn word frequency approach in automatic ab- stracting techniques to be actively explored was apparently that of Oswald and his associates. (Oswald et al, 1959 [459]; Edmundson et al, 1959 [180]). Like Baxendale, Oswald was interested in word pairs and word groups, particularly compound-noun and adjective-noun compositions, as more revelatory of meaning than single words. Unlike Baxendale, however, he was interested in the word group itself as selection criterion, whereas she had used word group or phrase clues for the selection of (usually) single indexing terms. Differences between their two approaches, both representing very early efforts in the field, are summarized by Edmondson and Wyllys as follows: "Oswald's experiment in automatic abstracting differs from Luhn's and Baxendale's techniques in that it combines the notion of significance as a function of word frequency and the notion of significance as a function of word groupings, by employing juxtapositions of significant words as the basic unit for measuring the importance of a sentence... "It may further be observed that Baxendale's exhibited indexes are made up of single words rather than word groups, in spite of the strong case she makes for using groups... "Baxendale's work is concerned solely with the automatic construction of indexes; she does not extend her treatment of word significance into the area of automatic abstracting." 1/ Oswald's "multiterms", however, were intended to overcome, in the areas of both automatic indexing and automatic abstracting, at least some of the difficulty that concepts are often expressed in compound nouns, word pairs, and longer groups of words consist - mg of n-tuples of substantive words or of phrases. The result of considering both word frequency and word-group frequency is that in Oswald's selection[OCRerr]groups it is usually the case that only one word of the group has an individually high frequency but the co- occurrence feature heightens the significance of the relatively lower frequency words with which it appears. Thus, for automatic indexing, Oswald proposed significant word groups as indexing terms, and his criteria for selection of sentences to be included in machine[OCRerr]generated extracts are similarly based on the number of significant groups in the sentences chosen. Other investigators who have stressed the importance of word pairs and longer groups as necessary to reflect concepts include Bar-Hillel (1959 [33]), Black(1963 [64]), Clark (1960 [1z3]), Doyle (1959 [165]), and Salton (1963 [519]). Doyle says succinctly that "when a phrase, or some other aggregation of words, stands for a single idea, its frequency in a document ought to interest us more than the frequencies of its component words." 2/ Salton considers it desirable to use word groups rather than individual words 1/ 2/ Edmundson and Wyllys, 1961 [181], pp.231-232. Doyle, 1959 [165], p. 11. 79