MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Indexes Generated by Machine-Automatic Derivative Indexing chapter Mary Elizabeth Stevens National Bureau of Standards and other devices to improve detection of significant clues to subject content. Repre- sentative examples of such work will be discussed below. In addition, investigators abroad have developed modifications to the basic Luhn word frequency approach which appear to be necessary when it is applied to languages other than English. 1/ Thus, for example, Purto reports various investigations conducted by V. A. Argayev and V. V. Borodin and by himself with respect to Russian language documents. _ Purto notes first that the Luhn method as applied to Russian language materials selects sentences which, while having the largest "significance coefficients", were not those most essential to the meaning and further that: "an abstract in Russian made by Luhn's method results in a choice of sentences not conveying basic information and not logically connected with each other. ` 3/ The reasons for such failure he attributes to the fact that words with different frequencies are considered equally important within a sentence for sentence selection purposes and to the lack of consideration for semantic and grammatical connectivity between significant words and between sentences. He then discusses several methods for determining connectivity, such as the rule that the sentences most closely connected with each other will be those in which the greatest number of the same signifi- cant words occur. 4/ A somewhat different example of difficulties occurring when the basic Luhn technique is applied to material in languages other than English is given by Levery. He describes a study of thirty French texts concerned with the development and manufacture of glass. He reports as follows: "While we followed the classical idea that a relationship between the frequency of a word and its significance exists, the fact that we worked with French texts forced us to discount the value of frequency alone. "French authors generally do not like to repeat the same words, and they vary their vocabulary... It was necessary to combine the frequencies of words with the same meanings or related to the same idea." `A dictionary of synonyms was constructed. . . (and) different versions of the same [OCRerr]d had to be regrouped." 5/ 1/ Note, however, that in the automatic abstracting program at Thompson Ramo- Wooldridge, small-scale experiments suggest that automatic abstracting is as feasible for other Indo-European languages as for English, (1963 [603], p. ii). Also, at the Centre d'Etudes Nucle'aire Saclay, automatic extraction experiments are being applied to texts both in French and other languages, see National Science Foundation's CR&D report No.6, [430], p. ZO. 3' 4/ 5/ Purto, 1962 [484]. He refers to a report "The problem of automatic abstracting and a means of solving it", by Argayev and Borodin, apparently available only as a typescript dated 1959. Ibid, p. 3. Ibid, pp. 3-4. Levery, 1963 [359], p.235. 78