MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Indexes Generated by Machine-Automatic Derivative Indexing
chapter
Mary Elizabeth Stevens
National Bureau of Standards
"1. Because of the mechanical method of preparation, more information
may be displayed than would have been practicable by conventional
means.
"2. Keywords-in-context permit the cross-correlation of subjects to an
extent not realizable by conventional procedures." 1/
The most common type of complaint against the KWIC indexing method is, as we
have noted earlier, identical with that which is applied to word indexing in general--the
lack of terminological control. Where the indexing terms are restricted to those used by
the author himself, in his title or even full text, there arise many serious problems of
synonyms, near-synonyms, homographs, neologisms, and eponyms. The effects of
machine inability to resolve these problems are redundancy, scatter of references
throughout the index, `1haphazard groupings" 2/ and retrieval losses because the user is
forced to guess at the terminology the author actually used. [OCRerr] These problems are
severely aggravated when only the title is used as the basis for index-word extraction.
Thus, a first and major question in attempting to appraise the effectiveness of KWIC-
indexing techniques is that of the adequacy of titles alone as the source of subject content
clues. Spurred on at least in part by the existence of KWIC-type indexes, several
investigators have studied this question, with somewhat different results. Williams has
explored for some years the possibilities of developing systematic procedures for title
elaboration, especially making explicit information that is implied. Her conclusions are
that indexing by title and direct elaboration of the title would produce index information
equivalent to that found in Chemical Abstracts for about 50 percent of the documents
studied, but that other procedures would be required for the remainder. 4/
Specific studies of title adequacy for a particular journal or field have been under-
taken by both the American Institute of Physics and the Biological Sciences Communica-
tions Project. In the A. I. P. experiments, graduate physics students were asked to
locate from limited clues certain specific articles appearing in The Physical Review, and
search times were checked for their use of permuted title and other indexes. Another
group of students compared the subject index entries in Physics Abstracts and Chemical
Abstracts with the words in the titles of 25 papers from The Physical Review. In the case
of Physics Abstracts, 69 percent of the entries for these papers were found in the words
of the title and 63 percent of the titles contained all of the information supplied by the
set of index entries. In the case of Chemical Abstracts, the corresponding percentages
were 47 and 23.5/ These latter findings, for the chemical index, are closely corroborated
1/
2/
3/
4/
5/
Luhn, 1959, [381] p.295.
Olney, 1963, [458], p. 44.
See, for example, Dowell and Marshall, 1962, [159], p.324: "This problem of
`conceptual scatter' becomes a nightmare when highly idiosyncratic author
language is used as a basis for subject indexing."
Williatns, 196[OCRerr] [643) , pp. 36[OCRerr] -363.
Maizell, 1960[392], p. 126.
57