MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Indexes Generated by Machine-Automatic Derivative Indexing chapter Mary Elizabeth Stevens National Bureau of Standards (2) computation of functions proportional to the number of initially occurring nouns for each sentence, and (3) the preparation of a normalized graph for initial noun occurrences by plotting the functional values against each sentence in the text.!' Sentence selection can then proceed by processes to detect "peaks" on the graph, using a relative criterion or weighting function to minimize the effect of high first-noun counts in the beginning sentences of a paper. Trials were made with a number of different weighting formulas, and the best of these involved the obtaining of moving averages of first-noun counts over several adjacent sentences. A particular formula covering a span of seven sentences gave results that appear to emphasize contextual effects and to reduce the effects of a particular single sentence with a large number of new nouns, such as a listing of proper names. The resulting abstracts are quite lengthy (e.g. , comprising 20 percent or more of the original text), and contain some relatively uninformative sentences. The investigators think that the results with respect to satisfactory abstracting are inconclusive but provocative. They also conclude that the possibilities for indexing are more immediately promising: "Most key definitions are retained in the successful summaries, and the vocabulary reflects the topics covered in the texts." 2/ Qther examples of mixed-system experimentation, especially involving the use of syntactic and semantic considerations, include the work at the General Electric Computer Department under Spangler, and work by Jacobson and Plath. In the Phoenix laboratories of General Electric, a KWIC type indexing program can be applied both to titles and to running text and a contemplated extension is intended to "generate indexes by means of word analysis, taking into consideration syntactic and semantic aspects of text lines". 3/ Jacobson describes rules for machine determinations of same-meaning occurrences of words which may be homographic and for selection of descriptors for indexing simple paragraphs by choosing words occurring at least twice with a high probability of having the same meaning. 4/ Plath reports: "Although sentences occur in which the key term or phrase lies buried deep down in the structure, preliminary observations indicate that there are many others in which the semantic hierarchy closely parallels that of the syntactic structure. This suggests that more sensitive vocabulary statistics for purposes of automatic abstracting may be obtainable by considering only words occurring in positions above a predermined cut- off level in the sentence structure. Alternatively, one might count occurrences of words on each level, and then multiply by a fixed "5/ weighting factor in each instance before taking the overall totals. - 1/ 2/ 3/ 4/ 5/ Lesk and Storm, 1962 [358], pp. 1-2, I-[OCRerr] ff. Ibid, p. 1-31. National Science Foundation's CR&D Report No. 11, [430], p. 21. Jacobson, 1963 [292], p. 191-192. Plath, 1962 [474], p. 190. 88