MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Conclusion chapter Mary Elizabeth Stevens National Bureau of Standards convinced that it is not, 1/ and for this reason, research efforts are being directed toward these other considerations. On-going research and development work - whether in modified derivative indexing approaching a 11concept-indexing11 level; in automatic assignment indexing techniques as such; in automatic classification or categorization procedures, or in potentially related efforts directed toward automatic abstracting, automatic content analysis, and other aspects of linguistic data processing - is both reasonably extensive and quite promising. Most of the investigators who are seriously active in the field report their current object- ives and recent accomplishments regularly to the National Science Foundation for publi- cation in the series "Current Research and Development Efforts in Scientific Documenta- tion." In the most recent issue, unfortunately current only as of November, 196Z, there are not less than Z5 reports of KWIC and similar title[OCRerr]permuted derivative indexing methods generated or proposed[OCRerr]to[OCRerr]be[OCRerr]generated by machine, there are several instances of investigations into various possibilities of modified derivative indexing to be accom- plished by machine, and there are five to ten reports of active experimentation with various automatic assignment indexing schemes. These efforts and even more recently organized projects point in the hopeful direction that "KWIC indexes should be merely a sample of things to come". z/ Assignment indexing techniques so far investigated can be, as we have seen, of two types which are quite distinct in terms of the principles involved. The first, which can be the more readily mechanized, involves the use of thesaurus-type lookup procedures cover- ing the definable rules of "scope notes", "authority lists", or "see also" reference prac- tice. The second type of assignment indexing, however, depends upon decision-making as to the propriety of assigning a particular indexing term to a particular document with reference to assignments to the collection as a whole (or a sample thereof). This latter type of assignment may be in terms 0£ a priori categorizations of separable subsets of the collection. Alternatively, the bases for the latter type assignment-indexing procedures may be derived from a posteriori determinations of the suitable subsets as in the factor analysis experiments of Borko, the latent class analysis approach of Baker, and the clustering- clumping approaches to automatic classification of Needham and others. It is to be noted in particular that Needham thinks an automatically generated categorization is preferable precisely because of lack of knowledge as to the exact attributes defining a class in 1/ See, for example, Climenson et al, 1962 [133], p. 178: "The statistical approach attempts to use no more than the occurrences of word spellings and their relative distances in the document environment ... [and] cannot provide the discrimination necessary for most indexing and abstracting applications"; Doyle, 1963 [162], p.3: "Automatic indexing and abstracting, as currently conceived, do not require any sort of dictionary or other semantic reference, but only counting, comparing, and sorting- operations well known in numerical data processing. But success in applying such rules on a purely automatic basis can't help but be limited"; Borko, 1962 [75], p.S: "Although difficult, identification [of different meanings carried by the same word, of the same meaning carried by different words] must be accomplished before the automatic categorization of document content can be truly effective. For the most part statistical methods, and even syntactic analysis, are inadequate for the job. A technique of textual analysis based upon the semantic properties of language is need- ed"; Grosch, 1959 [244], p. 20: "We need semantic methods *.. that will look for the intersection of redundant descriptors, each of which is at least slightly errone- ous. 2/ Doyle, 1962 [163], p. 381. 180