MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Automatic Classification and Categorization chapter Mary Elizabeth Stevens National Bureau of Standards name for the techniques evolved in this way is factor analysis. Insofar as it is practically applicable this technique has worked well enough; but... it has two limitations (a) that some classification problems are outside its scope, and (b) that it is not susceptible (at least as hitherto conceived) of adaptation com- putationally to the study of really large universes. ` 1/ The procedure of factor analysis first finds certain clumps, but then, as output, it gives us vectors relating the descriptors of the universe to the clumps found... "In most cases, factor analysis is used (especially in psychology) to debug the descriptor space; more conventionally put, to eliminate those tests (descriptors) which have an equivocal membership in several factors (Clumps) in favor of those which, having more definite allegiances, convey more information of the kind which the analysis suggests as valuable. It is thus only related to the classification of the universe at one remove; the classification it suggests is a simple categorical classification defined by the descriptors suggested as the most valuable... "The descriptive array of a universe is a table giving the applicability or inapplicability of each descriptor to each element. To classify the elements of the univerbe, we calculate for every pair of elements a similarity as a function of the corresponding rows of the descriptive array, and then regard the similarity matrix as a sufficient description of the univer3e. In factor analysis, on the contrary, we start with the matrix of correlations between the descriptors, each being a function of a pair of columns of the descriptive array..." 2/ Other investigators who have considered factor analysis techniques for possible applications to automatic indexing, automatic categorization of items in a collection of items, or search prescription renegotiation in a mechanized selection and retrieval system include Stiles (1962 L 573]), Doyle (1963 L162]), and Hammond (1962 L251]). Stiles, whose principal experimental results relate rather to the use of statistical associations between terms manually assigned to documents for search prescription formulation and renegotiation than to automatic indexing procedures as such 3/ has also considered both automatic indexing and automatic classification approaches. Specifi- cally, he has made at least preliminary investigations of the factor analysis technique independently developed for similar purposes by Borko. For a large collection of 105, 000 items, the statistics of co-occurrence of indexing terms were in some cases not as precise as desired because the same terms were used in different senses for different items in the collection. 1/ 2/ 3/ Note that Borko himself confirms this limitation as recently as November 1963, in stating, of the CLRU work on clumps: "However, even now these techniques have been applied to a 346x346 matrix which is beyong the capabilities of presently available factor analysis programs." ([OCRerr]963 L76] , p.8). Parker-Rhodes, 1961, [464], pp. 3-6. This principal concern is discussed below with reference to potentially related research, pp.119-122 of this report. 109