MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Problems of Evaluation chapter Mary Elizabeth Stevens National Bureau of Standards 7.2 Bases and Criteria for Evaluation of Automatic Indexing Procedures What should the bases be for the evaluation of existing or proposed indexing systems that rely, to a greater or lesser extent, on machine generation of the indexing or classi- ficatory labels? Since the evaluation of quality of indexing per se raises such fundamental and elusive questions, can these questions be begged for the case of automatic indexing as they are in fact for almost all manual systems? If so, the obvious bases are those of time, cost, availability of alternative possibilities, and customer acceptance. Here again we are faced with a dearth of objective data, even for the intercomparison of any two manual systems. In the two years preceding the ICSI Conference, the Program Committee openly solicited papers that would provide comparative data for operating information systems and that would develop and discuss criteria for the comparison of systems. [OCRerr]l/ Never- theless, of the papers received only two were responsive to this invitation: the special case of comparing the conventional file against the inverted file approach to the searching of chemical structure data (Miller et al, 1959 [4l9[OCRerr]), and an early report by Cleverdon on the ASLIB Cranfield project for the intercomparison of indexing systems, under a grant from the National Science Foundation (1959 [126]). There had been an earlier comparative experiment, generally conceded to be the first of its kind 2/in which 98 search requests were run by ASTIA personnel using a conventional catalog and by personnel of Documentation Inc. , using a coordinated uniterm index. Warheit says: "Unfortunately, the conditions of the test were very poorly designed so that, in the final analysis, each group was the sole judge both of the scope of the original request and of the adequacy of the bibli,o,[OCRerr]r,aPhies produced. The resulting claims are of course contradictory. 1/ 2I 3/ See "Proposed Scope of Area 4,'! Proceedings, ICSI, 1959 [481], pp. 665-669. Compare, for example, Gull, 1956 [246[OCRerr], p. 329: "When one considers that a fairly thorough search of the literature indicates that this comparison of two reference systems is the first undertaken so far, it is not surprising that the results reveal clerical errors and an incomplete design of the test." Warheit, 1956[631], p. 274. 149