MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Other Potentially Related Research
chapter
Mary Elizabeth Stevens
National Bureau of Standards
Certain difficulties are self-evident. Consider, or example, the admittedly hypothetical
text which might refer in various places to the "dissolute, disreputable, illiterate, elder
Lincoln" (underlining supplied) and which might be so processed by machine as to imply
that Lincoln the son was, although also President of the United States, "dissolute,"
"disreputable," "illiterate," and "elder." These, however, are difficulties that plague
almost any machine processing of natural language text.
Climenson, Hardwick, and Jacobson have explored some of the possibilities of the
Harris approach in experimental computer programs for the RCA 501(1961 [133]).
Specific features of these programs include:
1. Establishment of the syntactic class or classes to which a given word can
belong, by dictionary lookup.
z. Investigations of sentence structure and context in an attempt to resolve the
homographic ambiguities involved when the same word may function either
as a noun or a verb.
3. Isolation and marking of sentence segments, such as noun phrases, pre-
positional phrases, adverbial phrases, and verb phrases.
4. Identification and marking of segments -- clauses or degenerate clauses.
On a very preliminary basis, a limited set of word and phrase deletion rules were
set up and several sample documents were processed against them, yielding reductions
to about 35 percent of the original text. These results suggest that "syntactical filtering
criteria" might be applied to the improvement of modified derivative indexing techniques,
such as the word-frequency counting techniques, either by deleting syntactically insignifi-
cant parts of selected sentences, or by counting identical phrases rather than words. The
investigators conclude, however, that:
"A formal linguistic approach to the problems of natural language processing
promises to yield results vital to the success of automatic indexing and data
extraction. But the work required in such an approach will be quite arduous;
a long-range man-machine effort will be required to formulate practical
machine programs for indexing and abstracting " 1/
A final special case of linguistic data processing involving syntactic analysis is
that of Langevin and Owens. They claim:
"A critical review of the analysis work done on the Nuclear Test Ban Treaty
by use of the Multiple Path Syntactic Analyzer demonstrates that such a device
can, even at present, provide a powerful technique for the systematic discovery
of ambiguities in treaties and other d[OCRerr]cuments. Because the analyzer operates
without bias from the overall context of the document, it may sometimes be
possible for it to discover ambiguities that would easily escape a human reviewer
who knows what the document is `supposed to say'. "
1/
2/
Climenson et al, 1961 [133], p. 182.
Langevin and Owens, 1963 [346], p. 26.
131