MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Automatic Classification and Categorization chapter Mary Elizabeth Stevens National Bureau of Standards The possibilities of using factor analysis to sort out the different meanings were therefore explored. !` Using an IBM 704 program, the centroid method of factor analysis was applied to a matrix of correlation coefficients of terms that had co-occurred signifi- cantly with the term "exposure". Three factors were derived, one generally relating to the corrosive effects of exposure, another to "exposure" in the sense of photographic exposure, and the third dealing with both exposure-to-weather and exposure-to-radiation. Mthough the results were considered quite satisfactory, more extensive experimentation and use did not appear feasible because of computer matrix manipulation limitations. Doyle notes, in particular, that factor analysis might be used to give well-defined clusters separated one from another by clear boundaries rather than the less precise clusters found by most document grouping techniques. He emphasizes, however, that "its success in doing so of course, depends on the well-defined clusters actually being present in the data". He suggests that a combination of factor analysis and human editing to select items most typical of statistically derived categories could be valuable in such applications as the sorting of Congressional mail or the identification of trends in political or military intelligence materials free from the personal biases of an analyst. Hammond and his Datatrol associates who have worked on an application of the Stiles association factor technique for search question negotiation to legal literature have also considered the potentialities of factor analysis. Thus they report: The present association factor gives the relationship of one term to another. A factor analysis study would allow us to determine the relationship of a single term to a group of terms. From this we could learn how terms cluster when related to the same concept." 3/ 5.2 The Theory of Clumps It is assumed, in the work on the theory of clumps, that we have a population of objects or items among which at least some classes or groupings do objectively exist, but that we do not have any bases for precisely determining class membership require- ments. There may, therefore, be many possible ways of grouping and many possible definitions of clumps. On the other hand, such diverse definitions must conform to the extent of some similarities of membership in the clumps that they define if in fact they do define any of the existing classes. Assuming further that we are given information about properties ascribable to various members of the population, it is theorized that useful clumps can be discovered by investigating similarity connections between pairs of items, such as the number of co-occurrences of specific properties. Thereafter, only these similarity connections are considered, and the connection matrix is used as the basis for trial partitions of the population into various possible subsets. 1/ 2/ 3/ Stiles, 1962 [573], pp. 10-12. Doyle, 1963 [162], p. 12. Hammond, et al, 1962 [251], p. 17. 1[OCRerr]0