MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Indexes Generated by Machine-Automatic Derivative Indexing chapter Mary Elizabeth Stevens National Bureau of Standards Ruhi (1963 [506]) found that between 50 and 90 percent of author-prepared titles (the variation depending on subject field and other circumstances), did fully reflect the index terms assigned to these documents by human indexers. Lane and White and Walsh have also made studies directly related to the question of KWIC index effectiveness. The latter two investigators report only 52 percent retrieval effectiveness for a permuted title index to the Abstracts of Computer Literature, 1962, which they attribute to the changing terminology in the still new field of computer technology. [OCRerr]1/ Lane made counts of titles that would be "acceptable" and those that would not for a KWIC index for 50 titles drawn from each of 10 published indexes. He concluded that, if there were judicious pre-editing, technical articles in the technical subject indexes could be quite adequately covered, and papers in the fields of law, business, and the humanities somewhat less satisfactorily so, but that for the material indexed in the Reader's Guide to Periodical Literature, the KWIC technique would fail 58 percent of the time. [OCRerr]l Montgomery and Swanson have studied, as has O'Connor is even more detail, the adequacy of "machine-like indexing by people". Montgomery and Swanson took as their test corpus the September 1960 issue of Index Medicus and found that for 4, 770 items, 85.8 percent contained either the word itself or a synonym for the subject heading assigned, slightly over 11 percent did not, and in the remaining cases the investigators could not clearly decide. They concluded, therefore, that: "Most of the articles studied could have been indexed by machine on the basis of machine `inspection' of article titles alone." 3/ O'Connor, however, typically reports that of a random sample of 50 papers manually indexed under the term "Toxicity", five had titles which contained the word "toxic" or the word "toxicity" and 34 had titles which were not even indirectly connected with the term. ([;443], [444], [445], [447] and [448]). With respect to the Montgomery- Swanson conclusions as such, Carlson raises the further critical questions of over- assignment and false drops and suggests that: "a simple machine processing of titles would give us way too much or practically nothing." 4/ Research activities at the American Bar Foundation have included checking of KWIC type indexing of several thousand legal articles with the subject headings assigned under the "Index to Legal Periodicals" system (Kraft, 1962 [333]). It is reported that: 1/ 2/ 3/ White and Walsh, 1963 [639], p. 346. Lane, 1964[345], p.46. Montgomery and Swanson, 1962 [421], p. 359. In another study (1962 [534], p. 468), Swanson reports findings for several thousand entries in classified bibliographies where approximately 90 percent of the sampled items contained title words that were identical, or similar in meaning, to the subject headings under which they were indexed. He notes, however, that similar results could have been produced by machine processing with the sigrnficant p7oviso that the machine have available an adequate synonym dictionary or thesaurus. G. Carlson, 1963 [100], pp.328-329. 59 4/