MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Indexes Generated by Machine-Automatic Derivative Indexing
chapter
Mary Elizabeth Stevens
National Bureau of Standards
Ruhi (1963 [506]) found that between 50 and 90 percent of author-prepared titles (the
variation depending on subject field and other circumstances), did fully reflect the index
terms assigned to these documents by human indexers. Lane and White and Walsh have
also made studies directly related to the question of KWIC index effectiveness. The latter
two investigators report only 52 percent retrieval effectiveness for a permuted title index
to the Abstracts of Computer Literature, 1962, which they attribute to the changing
terminology in the still new field of computer technology. [OCRerr]1/ Lane made counts of titles
that would be "acceptable" and those that would not for a KWIC index for 50 titles drawn
from each of 10 published indexes. He concluded that, if there were judicious pre-editing,
technical articles in the technical subject indexes could be quite adequately covered, and
papers in the fields of law, business, and the humanities somewhat less satisfactorily so,
but that for the material indexed in the Reader's Guide to Periodical Literature, the KWIC
technique would fail 58 percent of the time. [OCRerr]l
Montgomery and Swanson have studied, as has O'Connor is even more detail, the
adequacy of "machine-like indexing by people". Montgomery and Swanson took as their
test corpus the September 1960 issue of Index Medicus and found that for 4, 770 items,
85.8 percent contained either the word itself or a synonym for the subject heading
assigned, slightly over 11 percent did not, and in the remaining cases the investigators
could not clearly decide. They concluded, therefore, that: "Most of the articles studied
could have been indexed by machine on the basis of machine `inspection' of article titles
alone." 3/ O'Connor, however, typically reports that of a random sample of 50 papers
manually indexed under the term "Toxicity", five had titles which contained the word
"toxic" or the word "toxicity" and 34 had titles which were not even indirectly connected
with the term. ([;443], [444], [445], [447] and [448]). With respect to the Montgomery-
Swanson conclusions as such, Carlson raises the further critical questions of over-
assignment and false drops and suggests that: "a simple machine processing of titles
would give us way too much or practically nothing." 4/
Research activities at the American Bar Foundation have included checking of
KWIC type indexing of several thousand legal articles with the subject headings assigned
under the "Index to Legal Periodicals" system (Kraft, 1962 [333]). It is reported that:
1/
2/
3/
White and Walsh, 1963 [639], p. 346.
Lane, 1964[345], p.46.
Montgomery and Swanson, 1962 [421], p. 359. In another study (1962 [534], p. 468),
Swanson reports findings for several thousand entries in classified bibliographies
where approximately 90 percent of the sampled items contained title words that were
identical, or similar in meaning, to the subject headings under which they were
indexed. He notes, however, that similar results could have been produced by
machine processing with the sigrnficant p7oviso that the machine have available an
adequate synonym dictionary or thesaurus.
G. Carlson, 1963 [100], pp.328-329.
59
4/