MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Other Potentially Related Research chapter Mary Elizabeth Stevens National Bureau of Standards Certain difficulties are self-evident. Consider, or example, the admittedly hypothetical text which might refer in various places to the "dissolute, disreputable, illiterate, elder Lincoln" (underlining supplied) and which might be so processed by machine as to imply that Lincoln the son was, although also President of the United States, "dissolute," "disreputable," "illiterate," and "elder." These, however, are difficulties that plague almost any machine processing of natural language text. Climenson, Hardwick, and Jacobson have explored some of the possibilities of the Harris approach in experimental computer programs for the RCA 501(1961 [133]). Specific features of these programs include: 1. Establishment of the syntactic class or classes to which a given word can belong, by dictionary lookup. z. Investigations of sentence structure and context in an attempt to resolve the homographic ambiguities involved when the same word may function either as a noun or a verb. 3. Isolation and marking of sentence segments, such as noun phrases, pre- positional phrases, adverbial phrases, and verb phrases. 4. Identification and marking of segments -- clauses or degenerate clauses. On a very preliminary basis, a limited set of word and phrase deletion rules were set up and several sample documents were processed against them, yielding reductions to about 35 percent of the original text. These results suggest that "syntactical filtering criteria" might be applied to the improvement of modified derivative indexing techniques, such as the word-frequency counting techniques, either by deleting syntactically insignifi- cant parts of selected sentences, or by counting identical phrases rather than words. The investigators conclude, however, that: "A formal linguistic approach to the problems of natural language processing promises to yield results vital to the success of automatic indexing and data extraction. But the work required in such an approach will be quite arduous; a long-range man-machine effort will be required to formulate practical machine programs for indexing and abstracting " 1/ A final special case of linguistic data processing involving syntactic analysis is that of Langevin and Owens. They claim: "A critical review of the analysis work done on the Nuclear Test Ban Treaty by use of the Multiple Path Syntactic Analyzer demonstrates that such a device can, even at present, provide a powerful technique for the systematic discovery of ambiguities in treaties and other d[OCRerr]cuments. Because the analyzer operates without bias from the overall context of the document, it may sometimes be possible for it to discover ambiguities that would easily escape a human reviewer who knows what the document is `supposed to say'. " 1/ 2/ Climenson et al, 1961 [133], p. 182. Langevin and Owens, 1963 [346], p. 26. 131