MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Appendix B: Progress and Prospects in Mechanized Indexing
appendix
Mary Elizabeth Stevens
National Bureau of Standards
Assuming, however, that the input processing problems have been solved, we may ask
what machines can do with respect to words in texts, or in portions of texts, that are avail-
able in machine-useful form? The machines can ??read?? the words for purposes of shifting
and sorting and can copy or reproduce the words in some desired order, as in a machine-
prepared concordance. Machines can match input words with words already in store and
thus exclude input words from further machine consideration (as by stoplists in KWIC
(Keyword-in-Context) and other forms of derivative indexing) or stress certain input words
with reference to a selective "inclusion" dictionary.
Next, machines can tabulate and count, so that both absolute and relative word fre-
quency data may be applied to either indexing or search-selection algorithms. Measure-
ments of sequential distances between selected words in the input text may also be applied.
Machine look-ups against a master vocabulary can provide automatic supplying of syndectic
information, synonym reduction, lexical normalization, generic-specific subsumption,
data with respect to previously observed word-word or word-subject co-occurrences. In
addition, information can be provided as to the possible syntactic roles of input words.
In the light of such machine capabilities, what can be said of the present state of the
art in automatic indexing? Automatic indexing in the sense of machine-prepared indexes
that are generated by the automatic extraction and manipulation of keywords, especially
from titles, is of course widely used in KWJC indexes such as Chemical Titles and many
others both in the United States and elsewhere.
Fischer Z[OCRerr]/ provides a retrospective view of KWIC indexing concepts, including
variants like KWOC (Keyword out of Context) and WADEX (Words and Authors Index to
Applied Mechanics Review). She stresses the potentialities of linking such extraction
indexing to selective dissemination systems and concludes: 11Plans for using the `Echo'
satellites to link information centers around the world, in a world wide drive toward im-
mediacy in information dispersion, will surely provide a place for KWIC indexes and for
the KWIC concept. " Warheit Z3/ also reports that consideration is being given to combining
selective dissemination systems and KWIC. Fundamental questions remain: How useful
and how much used are KWIC and other machine-generated indexes based upon the extrac-
tion of words from a limited portion of the author's own text?
These questions relate to an important distinction between two quite different types of
indexing. The distinction is that whereas "derived" indexing takes as index entries the
author's own words in the title, the abstract or the full text, in "assignment" indexing an
index term, descriptor, subject heading, or classification code is assigned to a document
as an indicator of content and the term assigned does not need to be identical with any of
the author's own words.
We can report continuing progress in use of derivative indexing techniques such as
KWJC, and also in experiments with automatic assignment indexing and automatic subject
classification. Timeliness of index production is certainly one of the major virtues of
KWIC. A similar timeliness is promised for automatic assignment indexing techniques
provided that requirements can be kept sufficiently low with respect both to keystroking
and computer processing.
Intermediate results may be achieved by pre-editing, normalization, and post-editing
techniques. Manual pre-editing to modify and supplement keywords in title, abstract, or
portions of text has been used in permuted title and KWJC-type indexing from the punched
card system that began operation in 1952 24/ to the `1notation-of-content" system developed
for NASA 25/. Kreithen 26/ suggests a combination of derivative and assignment indexing,
as follows:
226