MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Other Potentially Related Research
chapter
Mary Elizabeth Stevens
National Bureau of Standards
The study of the position of keywords in the text and the syntactical
relationship which exists among them will show the way to automatic ab-
stracting and the use of more sophisticated retrieval Systems." 1/
Plath suggests that, given a computer program to perform the parsing and
syntactic diagramming of a text sentence, the results can serve quite usefully to augment
the selection criteria based initially on statistical techniques, such as word-frequency
counting. He says, for example:
"Another possible application of the outputs of the sentence diagramming program
is their employment as an aid in language data processing for purposes of
information retrieval, particularly in systems for automatic literature abstracting
of the sort proposed by Luhn (1958). The feature of the tree diagrams which is
pertinent here is that the main components of a clause, including subject, verb
and object, always correspond to the `main topics' in an outline, and are therefore
located at the upper levels of the tree. When the words on these upper levels are
considered apart from the lower-level structures which modify them, they often
summarize the content of the sentence in a sort of `newspaper headline' or `tele-
graphic style'." [OCRerr]I
The problems of multi-level selection, or screening, such that machine programs
for selection of the most probably significant words, phrases, or sentences can be
focussed upon the most probably content-relevatory areas of text, are treated here, as
also by Salton, in the sense of a cutting-off at a given depth in the analyzed syntactic
structure. A potentially important contribution to the future prospects for automatic
indexing, however, lies in the "discourse analysis" and "transformational linguistics"
approach of Harris (1959 [254]), where condensations and concentrations of similarities
and differences of topical interest may hopefully be achieved.
Harris himself suggested, at least as early as 1958, applications of his approach to
both automatic indexing and abstracting. A goal of the analyses he has proposed is to
identify `kernels' of linguistic expression, having first, by various transformations such
as from passive to active voice, brought together different ways of saying the same thing.
He then suggests not only machine operations to normalize by application of his trans-
formational rules but also to determine:
Which kernels have the same centers in different relations (e.g. , with
different adjuncts), and other characterizing conditions. The results of this
comparison would indicate whether a kernel is to be rejected or transformed
into a section... of an adjoining kernel, or stored, and whether it is to be
indexed, and perhaps whether it is to be included in the abstract." 4/
1/
2/
3/
4/
Levery, 1963 [359], p. 236.
Plath, 1962[474], pp.189-190.
See also Thorne, 1962 [605], p. v: "The approach followed requires that the com-
puter itself syntactically analyse input text in order to convert it into special form
called FLEX, which preserves only that syntactic information which is useful for
data retrieval purposes."
Harris, 1958[254], p.949.
130