MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Indexes Generated by Machine-Automatic Derivative Indexing
chapter
Mary Elizabeth Stevens
National Bureau of Standards
at Rutgers in indexing of a book by computer programs (1963 [zo] and [22]) is an example
of such modified derivative indexing. Specifically, Artandi's method involves:
(1) Establishment of a list of key terms appropriate to a given subject
area to be used as an inclusion list for word extractions from text.
(2) Application of an appropriate syndetic apparatus to be used in the
compilation and ordering of the inde[OCRerr] entries.
(3) Means for the automatic selection of index entries other than those
on the pre-specified inclusion list, especially for the selection of
proper names.
The text used by Artandi for her study consisted of a 59-page chapter on halogens
from J. W. Mellor1s Modern Inorganic Chemistry. This text was keypunched with
special tags being assigned to indicate the page numbers and the incidence of capitalized
words in the text. Text words greater than three characters in length were first checked
against the inclusion dictionary of `1detection terms". There was, in addition, an
"expression term" dictionary which constituted the vocabulary of the final index and in
which a given expression term might or might not be identical with the corresponding
detection term. Cross-references were supplied by a program routine which checks the
index term list against a list of expression terms with their detection terms grouped
under them and which compiles cross-reference entries, one for each detection term
associated with an expression term appearing on the index list.
For her experimental corpus, Artandi's program developed 363 page references,
138 different index entries and 35 cross-references. She compared these results with
those obtainable by conventional human indexing with respect to the factors of heading
density (ratio of number of entries to number of words in the book), entry density (ratio
of the number of page references to the number of pages), and distribution (ratios of
entries for chemical compounds, proper names, and subject entries to the total number
of entries. No indexing errors were found in the computer-generated index for a 5
percent random sample of the pages of the corpus, but five Omissions were found in the
machine indexing of these sample pages. Artandi concluded, however, that although the
quality of indexing appeared favorable, the costs, which approximated $1.50 per page
indexed, were impractically high.
Book indexing by computer has also been investigated by Maloney, Dukes, and Green
at the Army Biological Laboratories, Fort Detrick, 1/
Maryland.[OCRerr] Input is based on the by-
product paper tape generated when the manuscript is typed on a tape typewriter. The
paper tape is in turn converted.to punched cards which are then processed by a UNIVAC
SS-90 II computer in an editing run that deletes unrecognizable codes and then stores page,
1/
C. J. Maloney, private communication. A report by C. J. Maloney, J. Dukes, and
S. Green, "Indexing reports by computer" is in process of preparation for
publication.
72