MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Indexes Generated by Machine-Automatic Derivative Indexing
chapter
Mary Elizabeth Stevens
National Bureau of Standards
and other devices to improve detection of significant clues to subject content. Repre-
sentative examples of such work will be discussed below. In addition, investigators
abroad have developed modifications to the basic Luhn word frequency approach which
appear to be necessary when it is applied to languages other than English. 1/
Thus, for example, Purto reports various investigations conducted by V. A. Argayev
and V. V. Borodin and by himself with respect to Russian language documents. _ Purto
notes first that the Luhn method as applied to Russian language materials selects
sentences which, while having the largest "significance coefficients", were not those most
essential to the meaning and further that: "an abstract in Russian made by Luhn's method
results in a choice of sentences not conveying basic information and not logically connected
with each other. ` 3/ The reasons for such failure he attributes to the fact that words with
different frequencies are considered equally important within a sentence for sentence
selection purposes and to the lack of consideration for semantic and grammatical
connectivity between significant words and between sentences. He then discusses several
methods for determining connectivity, such as the rule that the sentences most closely
connected with each other will be those in which the greatest number of the same signifi-
cant words occur. 4/
A somewhat different example of difficulties occurring when the basic Luhn technique
is applied to material in languages other than English is given by Levery. He describes
a study of thirty French texts concerned with the development and manufacture of glass.
He reports as follows:
"While we followed the classical idea that a relationship between the frequency of
a word and its significance exists, the fact that we worked with French texts forced
us to discount the value of frequency alone.
"French authors generally do not like to repeat the same words, and they vary their
vocabulary... It was necessary to combine the frequencies of words with the same
meanings or related to the same idea."
`A dictionary of synonyms was constructed. . . (and) different versions of the same
[OCRerr]d had to be regrouped." 5/
1/
Note, however, that in the automatic abstracting program at Thompson Ramo-
Wooldridge, small-scale experiments suggest that automatic abstracting is
as feasible for other Indo-European languages as for English, (1963 [603], p. ii).
Also, at the Centre d'Etudes Nucle'aire Saclay, automatic extraction experiments
are being applied to texts both in French and other languages, see National Science
Foundation's CR&D report No.6, [430], p. ZO.
3'
4/
5/
Purto, 1962 [484]. He refers to a report "The problem of automatic abstracting
and a means of solving it", by Argayev and Borodin, apparently available only as
a typescript dated 1959.
Ibid, p. 3.
Ibid, pp. 3-4.
Levery, 1963 [359], p.235.
78