MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Other Potentially Related Research
chapter
Mary Elizabeth Stevens
National Bureau of Standards
"Te ....... uses, as quasi-descriptors, word-sets chosen from the Oxford English
Dictionary (e.g. , any word falling between A-Ah) and relies on the subsequent
correlation of terms to make sense of his seemingly bizarre choice." 1/
Lefkovitz is concerned with the so-called "automatic stratification" of a file in
which both generic or associative relationships and exclusive partitioning is used to
facilitate search. He claims:
TI... The exclusive partitioning implies a separation of descriptors into groups
such that no two descriptors in a group co-occur in any given document description
of the file. This arrangement presents the dissociative properties of the file, or
forbidden combinations. When coupled with a superimposed display of the
`inclusive' or associative properties of the file a unique classification of the
descriptors of this file results, which is based solely upon the association of the
descriptors themselves within the document descriptions and not upon an arbitrary
set of classes constructed by professional indexers." 2/
The purpose is to assist the searcher by warning him that if he chooses more than
one descriptor from any one group as terms in his search request, there will be a null
response from this particular file. However, the particular application considered
involves a limited number of highly quantifiable or scalable "attribute-value" pairs, (for
so the descriptors involved are defined), such as "Age-23", and "Hair-red". It is by
no means obvious that comparable exclusive partitionings could be achieved for literature
items or that the recomputations necessary as new items enter the file can be achieved
on a practical basis.
6. OTHER POTENTIALLY RELATED RESEARCH
In this section we shall consider certain areas of potentially related research that
may prove applicable to the improvement of automatic indexing techniques. First is the
area of thesaurus construction and use, which in turn is somewhat related to the develop-
ment of statistical association techniques, especially for "indexing-at- time-of-search"
and search renegotiations. Natural language text searching will also be briefly
considered, together with related research in the general area of linguistic data
processing.
6.1 Thesaurus Construction, Use, and Up-Dating
The first area of potentially related research which promises improvements in
automatic indexing procedures is that of thesaurus lookups by machine. There are
several different possible definitions of the word "thesaurus" in the context of informa-
tion storage, selection and retrieval systems. The first is that it is a prescriptive
indexing aid, or authority list, serving the function of normalizing the indexing language,
primarily by the use of a single word form for words occurring in various inflections, by
the reduction of synonyms, and by the introduction of appropriate syndetic devices. The
second definition relates to the intended function for the provocation and suggestion to
the indexer or the searcher of additional terms and clues, and it follows the idea of word
groupings related to concepts as in a traditional thesaurus like Roget's. The third
1i
2/
Cleverdon and Mills, 1963 [131], p. 8.
Lefkovitz, 1963 [353], Preface, pp. VIII-IX.
1[OCRerr]4