MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Conclusion
chapter
Mary Elizabeth Stevens
National Bureau of Standards
convinced that it is not, 1/ and for this reason, research efforts are being directed toward
these other considerations.
On-going research and development work - whether in modified derivative indexing
approaching a 11concept-indexing11 level; in automatic assignment indexing techniques as
such; in automatic classification or categorization procedures, or in potentially related
efforts directed toward automatic abstracting, automatic content analysis, and other
aspects of linguistic data processing - is both reasonably extensive and quite promising.
Most of the investigators who are seriously active in the field report their current object-
ives and recent accomplishments regularly to the National Science Foundation for publi-
cation in the series "Current Research and Development Efforts in Scientific Documenta-
tion." In the most recent issue, unfortunately current only as of November, 196Z, there
are not less than Z5 reports of KWIC and similar title[OCRerr]permuted derivative indexing
methods generated or proposed[OCRerr]to[OCRerr]be[OCRerr]generated by machine, there are several instances
of investigations into various possibilities of modified derivative indexing to be accom-
plished by machine, and there are five to ten reports of active experimentation with various
automatic assignment indexing schemes. These efforts and even more recently organized
projects point in the hopeful direction that "KWIC indexes should be merely a sample of
things to come". z/
Assignment indexing techniques so far investigated can be, as we have seen, of two
types which are quite distinct in terms of the principles involved. The first, which can be
the more readily mechanized, involves the use of thesaurus-type lookup procedures cover-
ing the definable rules of "scope notes", "authority lists", or "see also" reference prac-
tice. The second type of assignment indexing, however, depends upon decision-making as
to the propriety of assigning a particular indexing term to a particular document with
reference to assignments to the collection as a whole (or a sample thereof). This latter
type of assignment may be in terms 0£ a priori categorizations of separable subsets of the
collection.
Alternatively, the bases for the latter type assignment-indexing procedures may be
derived from a posteriori determinations of the suitable subsets as in the factor analysis
experiments of Borko, the latent class analysis approach of Baker, and the clustering-
clumping approaches to automatic classification of Needham and others. It is to be noted
in particular that Needham thinks an automatically generated categorization is preferable
precisely because of lack of knowledge as to the exact attributes defining a class in
1/ See, for example, Climenson et al, 1962 [133], p. 178: "The statistical approach
attempts to use no more than the occurrences of word spellings and their relative
distances in the document environment ... [and] cannot provide the discrimination
necessary for most indexing and abstracting applications"; Doyle, 1963 [162], p.3:
"Automatic indexing and abstracting, as currently conceived, do not require any sort
of dictionary or other semantic reference, but only counting, comparing, and sorting-
operations well known in numerical data processing. But success in applying such
rules on a purely automatic basis can't help but be limited"; Borko, 1962 [75], p.S:
"Although difficult, identification [of different meanings carried by the same word,
of the same meaning carried by different words] must be accomplished before the
automatic categorization of document content can be truly effective. For the most
part statistical methods, and even syntactic analysis, are inadequate for the job. A
technique of textual analysis based upon the semantic properties of language is need-
ed"; Grosch, 1959 [244], p. 20: "We need semantic methods *.. that will look for
the intersection of redundant descriptors, each of which is at least slightly errone-
ous.
2/ Doyle, 1962 [163], p. 381.
180