MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Conclusion
chapter
Mary Elizabeth Stevens
National Bureau of Standards
Derivative indexing, whether by man or machine, is thus subject to many disadvan-
tages. First and foremost, it is constrained by a particular individual's personal manner
of expression of concepts in language. This limitation is controlled only by his presump-
tive desire to communicate with some particular (more or less general, or more or less
specialized) audience. His choices of natural language expressions, however, will be
conditioned by at least some of the following factors:
(1) The range and precision of his personal mastery of both general and
specialized vocabularies for a given time, place, and specialized field
of discourse.
(2) His personal expectations as to the probable reactions (in the sense of
effective communication) of his intended audience to the expressions that
he does choose, involving all of the problems of different usages of tech-
nical terminology from field to field, from formal to informal presenta-
tions, from scholarly reviews to progress reports heavy in current
"technese" and "fashionable words".
(3) His habits of thought and his training in his field.
(4) His awareness of more than one possible audience and of more than one
point or topic of potential interest to his readers.
Secondly, indexing by the author's own words is remarkably sensitive to a particular
period of time, so that the terminology becomes rapidly outdated and often seriously mis-
leading in its connotations. Thirdly, the user has no advance knowledge of the terminology
that has been used in all the varied texts of a collection and he must therefore be able to
predict a wide variety of possible ways of expressing ideas in words, phrases, and even
by implication. Fourthly, for collections indexed on a word-derivative basis, there is
little or no possibility for generic searching. 1/ Finally, there is the more general
question, applicable to both derivative and assignment indexing, of how well, ever, can a
condensed representation serve the purposes of specific subject content recapture? In the
strict sense, only by the elimination of truly redundant information. But even this is a
relative matter. What is redundant for an author may not be so for several different p0-
tential users of the reports or papers that this author writes. What is redundant for one
user is not necessarily so for others.
The further problem for machine techniques is therefore: how selection rules can
be provided that will replicate a given human pattern of selectivity, or, alternatively, how
selection rules can be established and defined that will produce an equivalent and compar-
able result - that is, one which typical users would agree is as pertinent to their query-
answer relevance decisions as any available alternative.
Certainly the problem of appropriate selection is at the heart of the matter. This is
a crucial question, even if we sort out and can specify the different uses, for a particular
collection, a particular clientele, at a particular time, that automatically generated con-
densed document representations may have. Wyllys, in appraising automatic abstracting
efforts, considers that the goal should be to provide extracts which will serve a search-
tool function - - that is, they will furnish the searcher with enough information about the
document content so that he may decide whether it is probably pertinent to his then interests
or not and hence decide whether or not to read the document in full. By contrast, he says
of the "content-revelatory function" that an abstract should: "furnish the reader with
enough information about the related document so that in most cases he will not need to
read it itself. " 2/
1/ See for example, Doyle, 1963 [162], with respect to lack of capacity for generic
searching as one of the major disadvantages of natural text search systems.
Jz Wyllys, 1963 [653], p. 6.
174