MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Operational Considerations
chapter
Mary Elizabeth Stevens
National Bureau of Standards
on that word . .. If there are four different Double types of which the first word is
`external' the addresses of the four different second words form a new list which is
linked to the entry for [OCRerr] Each word type occurs only once in core, and all
word pairs of which it is a member refer to it by means of its core addresses.
11The program could process millions of words, automatically generating frequency
counts far larger than the Thorndike and Lange counts, which cost many man-years,
and in addition, FEAT would provide complete lists of word pairs (Doubles and
Reverses), which, so far as we know, have never been counted in a sample of appre-
ciable size, despite their importance for semantic analysis of text.
FEAT is used, together with a modified version of the Proto-Synthex program, and
special output formatting routines, for another SDC programj the Descriptor Word Index
Program, which produces a content-word-concordance for natural language text as well
as statistics reflecting the type of words that occur, frequencies of occurrence, and posi-
tional data, (Olney, 1960 [457], 1961 [456]; Stone, 1962 [574].
The IPL-V list-processing language is used by Kochen in some of his work on sim-
ulated concept processing by machine. Programs for accepting sentences written in a
formal language which was constructed of names and logical predicates (inserted either
from a console or in the form of punched cards), for updating and re-organizing a file of
such sentences, for storing and manipulating metalinguistic sentences such as "If X is
author of Y and Y pertains to topic Z, then X has worked on Topic Z", for interrogating
the file, and for tracing associations between names linked through various predicates,
have been written in this language. 1/
8.3 Output Considerations
Turning to operational problems of output, the question of limitations of computer
printout language to, in most cases, a single set of upper case alphabetic characters,
numerals, and a few special symbols, 2/ is a serious factor in customer acceptance with
respect to appearance -- format, legibility, readability. Involved here are questions pre-
viously mentioned. Where, in the only presently available outputs of machine-generated
indexes, the KWIC type permuted title indexes, should the indexing access point "slot" be
on the page? Should all or only part of the title be displayed? Should 60- or 106-character
lines be used? More detailed discussion of these and related points are provided by, for
example, Youden (1963 [658]) Kennedy (1962 [311]) and Brandenberg (1963 [80]).
A separate, but related question, is how much identification, and in what form,
should be provided for the item itself either directly as a part of the index entry or by
cross-reference to the address of more detailed information. There seems to be quite
general agreement that the typical user needs something more than author's name and title
1/
2/
Kochen, etal, 1962 [328], p. 34.
See, for example, Lipetz, 1960 [365], p. 252: "A disadvantage of keypunched cards
however, is the lack of capacity to record or to print other symbols than a one-case
alphabet, one case of arabic numerals, and about a dozen punctuation marks and
miscellaneous symbols. Citations in the scientific literature generally make use of
a much larger number of significant symbols: multiple cases, multiple fonts, italics,
boldface, Greek letters, mathematical symbols, etc. " Note, however, that Chem
ical-Biological Activities, a digest produced by Chemical Abstracts Service, uses
printouts of the modified IBM 1403 chain printer, using 120 characters (see Fig. 5).
171