MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Automatic Classification and Categorization chapter Mary Elizabeth Stevens National Bureau of Standards dichotomy can be observed. There is, on the one hand, a spate of examples of automatic derivative indexing where words used by the author himself or by human analysis are sorted and arranged, by machine, to provide index listings, announcement bulletins, and current awareness distribution notices. There are also, on the other hand, at least a few instances of investigations where the machine assigns category labels, indexing terms, or "heads" and `1headings" from a classification schedule, to new items. 1/ In general, as Needham - points out, proposed automatic assignment indexing pro- cedures can be investigated with reference to a previously existing index term vocabulary, an existing classification system or schedule, or to specially designed vocabularies and subject heading lists. On the other hand, if it is not known how well existing systems do in fact characterize documents and if it is not known whether all pertinent properties of the documents have been consistently ident[OCRerr]fied, then it may be preferable to develop methods for assigning documents to the appropriate class in a classification system which is itself set up automatically. [OCRerr]2/ Needham also suggests still a third possibility: that of setting up automatically a classification within which the subsequent classifying of docu- ments is done by hand. The principal experimental results, to date, of attempts to achieve automatic classification of documentary items, especially in the sense of machine-generated groupings or categorizations of such items, have been those of applying techniques of "clumping'1, 3/ factor analysis, and "latent class analysis't. [OCRerr]4/ We shall briefly consider below some typical investigations into automatic classification or categorization proce- dures that have already had, or may have, applicability in automatic index mg techniques. In the late 1950's, Tanimoto undertook theoretical studies of mathematical approaches to problems of classification and prediction with special reference to matrix manipulations of sets of attributes of items to be classified. 5/ He also investigated 1/ 2/ Needham, 1963, [432], p.1. Ibid, p. 1-2: "If we are to assign a document to a class automatically, we must have a) a list of facts about the classes which will make ascription possible: b) an algorithm, usually some sort of matching algorithm, to tell us which class best suits a document. Given a classification like the U. D. C. , it is not at all obvious that a) and b) exist, or even, if they can be found. a) and b) imply a degree of uniformity about the classification which may just not be there." 3/ 4/ 5/ That is, the clustering of objects that are in some sense similar because they share certain attributes or properties, even if, and especially when, the identity of cluster-producing common properties is not known in advance. Compare Doyle, 1963 [162], p. 13; "There are other statistical techniques besides factor analysis whose output is document clusters, such as latent class analysis and clump theory, and there is a surprising increase in research in this kind of analysis just within the last two years." Tanirnoto1 1958 [593], 1961 [594]. See also Borko, 1963 [76], pp. 4-5: "In 1958, Tanimoto published a theoretical paper on the applications of mathematics to the problems of classification and prediction. Specifically, he pointed out how the problems of classification can be formulated in terms of sets of attributes and manipulated as matrix functions." I 07