MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Other Potentially Related Research
chapter
Mary Elizabeth Stevens
National Bureau of Standards
1/
under the general heading of the association map technique, - although passing reference
has been made to some of Doyle's suggestions and findings elsewhere in this report.
Beginning in 1958 (Doyle, 1959 [168]) information retrieval projects at the System
Development Corporation have had, among other objectives, that of developing ways to
use computers in the processing and interpretation of natural language text. By February
of 1959, a computer program was already in operation that could search fragments of
about 100 words of keypunched text, match input words against a pre-established clue word
selection list (i. e. , an inclusion dictionary) and substitute a short encoded form to be
used for subsequent search. Processing of keypunched abstracts using this program in-
volved computer time at the rate of four abstracts per second.
Other features of this text compiler, and of subsequent text processing programs
developed at SDC, enable the making of frequency counts and other statistical measures.
Such features are then used for the investigation of, for example, word-word, word-
document, and word-subject associations, looking toward the determination of answers to
such questions as: "Do subject words have distribution characteristics within a library
that a computer program can detect?"
Doyle's investigations of word co-occurrences have included hypotheses and tests
of various probabilistic measures in terms of observed frequenci[OCRerr][OCRerr], in terms of "boingi"
words (so-called because of the mental sound effect they elicit), - in terms of adjacent
word pairs and affinities between particular nouns and particular adjectives, !` and in
terms of distinctions between frequency (the total number of times a word appears in a
give nlibrary corpus) and prevalence (the total number of items in which a particular word
appears). Si He has also stressed distinctions between adjacent words and high corre-
lations for words that are not closely positioned together in text, as follows:
1/
Compare Doyle himself, 196Z, [163], p.383: "Swanson and others have offered
thesauri of synonyms and related terms... (to assist in indexing or search
processes)... An association map is, in a sense, an extension of this solution; it is
a gigantic, automatically derived thesaurus. Confronted by such a map, the
searcher has a much better `association network' than the one existing in his mind,
because it corresponds to words actually found in the library, and, therefore, words
which are best suited to retrieve information from that library." See also Wyllys,
196z [651], p. 16: "L. B. Doyle (1961) has invented a fascinating search tool which
seems to us to belong at a level intermediate between automatic indexes and auto-
matic abstracts; i.e., a possible search method might be to have the computer scan
automatic indexes and compare the index terms therein with the request, then
obtain the possibly pertinent documents and display their association map for the
user to examine..."
ZI
3'
4/
5'
Doyle, 1959 [168], p.6.
Doyle, 1959 [165], p. 5.
Doyle, 1961 [169], p. lZ; 1959 [165], p. 16.
Doyle, 1962[163], p. 380.
1Z3