MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Operational Considerations
chapter
Mary Elizabeth Stevens
National Bureau of Standards
assignment they had co-occurred)3 the machine had a sufficient basis in the input material
for the derivation of a selection-score for at least lZ descriptors for each new item. The
items were closely similar to, though not identical with, the source items from which the
word associations with descriptors assigned had been drawn. The sample is obviously
critically small. Nevertheless, the possibility that extensive clue word lists, notwith-
standing the incorporation of trivial and even erroneous associations, can be used as
effectively as smaller, more precise, and more carefully tailored lists, but with signifi-
cant gains in memory space or computational reqilirements, is suggestive. A somewhat
related conclusion, again reflecting the effect of processing requirements, is stated by
Needham as follows:
11The main point to be made is that theoretical elegance must be sacrificed to com-
putational possibility: there is no merit in a classification program which can only
be applied to a couple of hundred objects." 1/
In KWIC type derivative indexing by machine, except in terms of allowable character
sets and word-lengths conveniently processed, the problem of appropriate programming
languages does not arise to any serious extent. For the processing of material in research
on natural language text, however, the choice of interpretative and compiler types of auto-
matic programming languages may involve computational requirements which, while being
inappropriate in a production situation, offer considerable flexibility and versatility for
experimental purposes. Examples of special programs of this type include the use of
Yngve's COMIT by Baxendale and Knowlton, the development and use of FEAT by Olney,
Doyle, and others at SDC, and the use of list-processing techniques in the General Inquirer
system. z/ Yngve describes the use of his program as follows:
"COMIT has also been used in the experimental work in information retrieval of
Baxendale and Knowlton at IBM. The purpose of their COMIT program was to accept
as input the title of a document and to produce as output, not only descriptors, but
pairs of descriptors which are roughly of the form adjective-noun. The purpose of
the work is to automatically generate, from document titles, retrieval words of a
more specific nature than simply Boolean functions of the existence of certain words
in a title. " 3/
The FEAT program was designed originally for word and significant-word-pair
frequency counts. Olney describes the program in part, as follows:
"FEAT is designed to perform frequency and summary counts of words and word
pairs occurring in its natural text input; i.e., text written it' ordinary English and
transcribed into Hollerith code according to some set of keypunching rules. To
focus attention on the semantic aspects of word pairs rather than on their syntactic
aspect, pairs of which one member is a function word, such as `the', `is', `by',
etc., are excluded.
"Using a bucket list structure of the type proposed by C. J. Sheen in FN-1634, the
program sorts each incoming word serially, constructing a list within each of Z56
buckets for good words of a given alphabetic range ... and another list within each
good word entry for the Doubles and Reverses which will be ordered alphabetically
1/ Needham, 1963 [433], p. 8.
Stone, et al, various references, p. 137 of this report.
3/ Yngve, 196Z [655], p. Z6.
170