MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Indexes Generated by Machine-Automatic Derivative Indexing chapter Mary Elizabeth Stevens National Bureau of Standards "1. Because of the mechanical method of preparation, more information may be displayed than would have been practicable by conventional means. "2. Keywords-in-context permit the cross-correlation of subjects to an extent not realizable by conventional procedures." 1/ The most common type of complaint against the KWIC indexing method is, as we have noted earlier, identical with that which is applied to word indexing in general--the lack of terminological control. Where the indexing terms are restricted to those used by the author himself, in his title or even full text, there arise many serious problems of synonyms, near-synonyms, homographs, neologisms, and eponyms. The effects of machine inability to resolve these problems are redundancy, scatter of references throughout the index, `1haphazard groupings" 2/ and retrieval losses because the user is forced to guess at the terminology the author actually used. [OCRerr] These problems are severely aggravated when only the title is used as the basis for index-word extraction. Thus, a first and major question in attempting to appraise the effectiveness of KWIC- indexing techniques is that of the adequacy of titles alone as the source of subject content clues. Spurred on at least in part by the existence of KWIC-type indexes, several investigators have studied this question, with somewhat different results. Williams has explored for some years the possibilities of developing systematic procedures for title elaboration, especially making explicit information that is implied. Her conclusions are that indexing by title and direct elaboration of the title would produce index information equivalent to that found in Chemical Abstracts for about 50 percent of the documents studied, but that other procedures would be required for the remainder. 4/ Specific studies of title adequacy for a particular journal or field have been under- taken by both the American Institute of Physics and the Biological Sciences Communica- tions Project. In the A. I. P. experiments, graduate physics students were asked to locate from limited clues certain specific articles appearing in The Physical Review, and search times were checked for their use of permuted title and other indexes. Another group of students compared the subject index entries in Physics Abstracts and Chemical Abstracts with the words in the titles of 25 papers from The Physical Review. In the case of Physics Abstracts, 69 percent of the entries for these papers were found in the words of the title and 63 percent of the titles contained all of the information supplied by the set of index entries. In the case of Chemical Abstracts, the corresponding percentages were 47 and 23.5/ These latter findings, for the chemical index, are closely corroborated 1/ 2/ 3/ 4/ 5/ Luhn, 1959, [381] p.295. Olney, 1963, [458], p. 44. See, for example, Dowell and Marshall, 1962, [159], p.324: "This problem of `conceptual scatter' becomes a nightmare when highly idiosyncratic author language is used as a basis for subject indexing." Williatns, 196[OCRerr] [643) , pp. 36[OCRerr] -363. Maizell, 1960[392], p. 126. 57