MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Problems of Evaluation chapter Mary Elizabeth Stevens National Bureau of Standards `1A total oŁ 2245 headings were suggested, averaging 1.1004 headings per book per student. These headings represented 373 different varieties, of which 368 were different from the headings traced on the Library of Congress cards for the sample books... As an average 62.17 different headings were suggested for each book... "When the 368 different varieties of incorrect headings were analyzed in accordance with certain criteria that had been set up, it was found that incorrect specificity was a factor in 93.48%, incorrect terminology in 79.08% and incorrect form of entry in 72.28% of the headings... Over half of the incorrect headings (54.62%) had some combination of two errors, and almost half (49.73%) could have been converted into `correct' headings only by changing the level of specificity, and by revising the term- inology, and by altering the form... "It was also found, contrary to the general assumption that failure in specificity almost always means that the reader is approaching his subject from too broad a point of view, that of those headings in which an incorrect level of specificity was a factor... 64.82% were too broad and 35.18% were too narrow." 1/ Lilley then asks the rather plaintive question as to what would happen, given that his quite homogeneous group of subjects, all of them college graduates and all seriously interested in librarianship, could come up with more than 62 different headings, on average, for every heading actually used in the catalog, if his test group had included a larger number of subjects with more heterogeneous interests? In 1961, Macmillan and Welt investigated the duplicate indexing of 171 papers in a limited area of the medical sciences (1961 [389]). In only 18 percent of the cases was the indexing identical or nearly so. About a third of the papers had been indexed so differently that there was no common correlation. For the rest, terms were used in one case that were missed in the other. Some brief data on inter-indexer consistency is also provided by Kyle (1962 [342]) for two indexers applying her classification system to 246 arbitraily selected French and English items in the field of political science. Of these, 160 were indexed the same way by both indexers, for a consistency figure of 70 percent. Tritschler noted that no items were indexed the same way a second time as they were the first, in small-scale experi- ments involving 20 documents independently indexed by 7 different people. 2/ Painter (1963 [460]), in her study of problems of duplication and consistency of subject indexing of the reports handled by the Office of Technical Services, proceeded by selecting items from the announcement bulletins of agencies contributing to OTS, having these items re-indexed in the various agencies, and comparing the results with the origi- nal indexing assignments. At ASTIA, 94 items were re-indexed, with 1, 239 terms having been assigned to them originally and 1, 119 assigned on the re-run. Overall, 62 percent of those terms originally assigned were also assigned the second time, and 69 percent of the second-time terms had also been assigned originally. However, 111 of the starred des- criptors (which are of the most significance in the ASTIA system) were used the first time and not the second, while 98 were used the second time but not the first. 1/ Lilley, 1954 [360], pp. 42 and 43. 2/ Tritschler, 1963 [610], p. 5. 158