MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Conclusion
chapter
Mary Elizabeth Stevens
National Bureau of Standards
existing classification schemes. However3 in the related field of pattern recognition Uhr
and Vossler have shown promising results both for criterial feature analysis (a priori
assumption as to attributes or properties governing membership in specified classes) and
for randomly generated discrimination operators which3 applied in a recursive manner,
are increasingly adaptive to the detection of class-mambership (Uhr and Vossler, 1961
[615]).
One particular way of looking at the problems of automatic indexing results, in
effect, in placing these problems within the.broader field of pattern perception and pattern
recognition. We suggest that this is in fact a particularly fruitful approach. Certainly
there is a wide area of potential commonality, and many promising leads for further re-
search in automatic categorization can be found in the general pattern recognition litera-
ture, especially in work on randomly generated operators and on the problems of deter-
mination of membership in classes. 1/ Conversely, automatic classification techniques
originally conceived as applicable to the handling of documentary information have in fact
been applied quite successfully to at least one case of groupings of physical objects on the
bases of machine-detectable common properties.
The question of determination of membership-in-classes is basic to the problems of
automatic classification and categorization. Thus the techniques for discriminating the
statistically significant associations between "properties" of objects or items that are to
be grouped into classes or categories, even when such "properties" are not known in
advance and have no [OCRerr]riori identification, point to an increasing and promising conver-
gence of research in pattern recognition, propaganda analysis and psycholinguistics, math-
ematics and statistics, studies of linear threshold devices, and the like, as well as in the
linguistic data processing field as such.
It is true that such synthesized "classes1' may have no convenient "names11 or
linguistic interpretations which make much sense to the individual human searcher or user.
Nevertheless, what is suggested is that a radical departure from conventional habits of
literature search and retrieval may be desirablefrom the standpoint of effective use of
machine potentialities. This might mean that, ab initio, the customer would pose to the
system a search query request not couched in his notion of words or terms actually used
in the system, but either (a) an outline or statement of his own research proposal and
plan of attack or (b) an indication of one or several items that he has already decided are
pertinent to his interests, with a request for "more like these".
An equally radical departure from conventional present habits and thinking is already
implicit in Needham's suggestion of an automatically derived classification system and
manual assignments thereto. z/ It would attack present-day machine capacity and proces-
sing time limitations such that property and class or category associations must be held to
something less than 1, 000 x 1, 000, unless prohibitive processing costs are to be incurred.
This approach would assume a one-time large-scale building of vocabulary and term or
category associations and derivation of assignment algorithms, and the printing out of the
results in multiple copies for use by low-level clerical personnel carrying out, indeed,
"machine-like" indexing.
A final promising approach to the future prospects for fully automatic indexing and
categorization is the perseverance in research and development efforts in advance of the
!/ See, for example, Sebesyten, 1961 [539], 1962 [538].
z/ Needham, 1963 [432], p. 1.
181