MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Appendix B: Progress and Prospects in Mechanized Indexing
appendix
Mary Elizabeth Stevens
National Bureau of Standards
We may thus conclude that the progress and prospects of automatic indexing3 as of
September 1966, are both provocative and challenging. They are `provocative because so
much in terms of both practical and theoretical accomplishment has already been dem-
onstrated, and "challenging" because so much remains to be done. Further, what remains
to be done will in all probability require serious, intensive, and imaginative investigations
of a wide variety of questions from the relative usage and acceptability of a KWIC index
through possible changes in author and editor practices to the fundamental questions of
semantics and human judgment.
Nevertheless, when the results of automatic classification or automatic indexing
procedures reach levels of 70 percent or better mean agreement either with human in-
dexers or with potential users evaluating the relevance of items retrieved by such indexing,
then the machine methods should be preferred to routine, run-of-the-mill, manual indexing
wherever the costs are at least commensurate.
The technical feasibility of achieving such performance levels for a relatively small
number of classification categories or a relatively small vocabulary of index terms has
already been demonstrated experimentally. There remain unresolved questions of the
extent to which it will be possible to apply such techniques to the larger vocabulary require
ments and the practical operating considerations in actual collections.
Assuming that we can solve these problems, however, many advantages will accrue.
First is the speed with which many items can be indexed --- in a few minutes or hours at
most for, say, 10, 000 items. Secondly, there are advantages of timeliness and the ease
with which an entire collection can be re-indexed or re-classified. A third advantage is
the consistency of the machine procedures, especially as compared with the inconsistency
to be noted in available data on tests of comparative performance among indexers.
The advantage of ability to re-index quickly, easily, and inexpensively (because most
input costs will have been incurred previously) is of major importance in terms of over-
coming present barriers to the introduction of improvements in operating systems (since,
as Kyle L'/ points out, "The most common reason for not trying new and/or improved
techniques of classification and indexing is the difficulty of reclassifying and re-indexing
large collections ?Y) and in terms of dynamic revision and up-dating (as Borko 37/
emphasizes).
Another advantage, particularly of methods using teaching samples is (as suggested by
Mooers as early as 1959 52/), the capability for making assignments of indexing terms in,
say, an English language system to items whose texts are written in other languages:
French, German, or Russian. This type of advantage can point the way to greater interna-
tional collaboration in indexing and document control procedures.
A further possibility is suggested by the convergence of automatic indexing techniques
based upon teaching samples with adaptive selective dissemination systems and client feed-
back possibilities, especially those involving `more-hke-this requests. If we assume a
large-scale, multiple-access system with adequate personalized files for the typical client,
the common data bank of document identificatory and selection criteria, condensed rep-
resentations, and full text (if available) can be selectively accessed by him on the basis of
automatic indexing generated by his own choice of selection criteria and his own choice of
exemplar items for each such criterion.
231