MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Other Potentially Related Research
chapter
Mary Elizabeth Stevens
National Bureau of Standards
considered. Translating criteria into these key terms is a process of normalization
which will eliminate many disagreements in the choice of specific terms amongst
recorders, amongst inquirers, and amongst the two groups, by merging the terms
at issue into a single key term. However, the dictionary does not classify or index
but maintains the idea of being fields... A specific term may appear under the
heading of several key terms and if according to its application an overlapping of
concepts exists then the term is represented by the several key terms
involved..." 1/
In subsequent papers, Luhn has developed related ideas of a `1family of notions" and
"dictionaries of notional families'1. 2/ In particular, he emphasizes that for automatic
indexing, by contrast with automatic abstracting, consideration should be given to the
normalization of variations in author-chosen terminology: "It will be necessary for a
machine to resolve variation of word usage with the aid of a device the functions of which
resemble a dictionary at one level and of a thesaurus at another level of requirements." 3/
The first issue of the National Science Foundation's compendium of project state-
ments, "Current Research and Development in Scientific Documentation", which appeared
in July 1957 [430] reported several projects of interest in terms of thesaurus construc-
tion and use, 4lnamely: (1) work by Luhn at IBM involving the establishment of a
thesaurus to facilitate encoding of items whose texts would be available in machine-usable
form, (2) work by Bernier and Heumann at Chemical Abstracts Service looking toward the
development of a technical thesaurus, (1957 [57]), and (3) an approach to mechanized
translation proposing to use a mechanized thesaurus at the Cambridge Language Research
Unit. This latter project incorp[OCRerr]ated the ideas of Masterman and her associates from
about 1956 on (Halliday 1956 [249], Masterman, 1956 [403]; Joyce and Needham, 1958
[305]), to apply the prihciple of checking co-occurrences of text words against thesaurus
"heads" to which they belonged, in order to resolve homographic ambiguities and thus
achieve more idiomatic translation by machine.
For the ICSI Conference in 1958, Masterman, Needham and Sparck-Jones prepared
a paper discussing analogies between machine translation and Information retrieval, and
recapitulated the arguments of Needham and Joyce for the use of a thesaurus in the
formulation of search requests, as follows:
a large number of terms are used to describe a document, the existence of
synonyms is likely: in a system such as uniterm no attempt is made to bracket
the synonyms, whicih[OCRerr]. means that a request will produce only the document described
11
2/
3/
4/
Luhn, 1953[383], p. 15.
Luhn, 1959 [371], p.51, 1959[384]; 1957[385], p.316.
Luhn, 1959[384], p. 12.
National Science Foundation's CR&D Report No. 1, [430], pp. 21, 6,4.
116