MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Problems of Evaluation chapter Mary Elizabeth Stevens National Bureau of Standards Still further studies of indexer consistency investigated at the Information Systems Operation division of General Electric have just recently been reported (Korotkin and Oliver, 1964 [331, 33Z]). In particular, the investigators report on the effects of subject matter familiarity and on the use as a job aid of a reference list of suggested descriptors upon inter-indexer consistency. The material for test consisted of 30 abstracts drawn from Psychological Abstracts, to be indexed by 5 psychologists and 5 non-psychologists in two sessions, with and without use of the "job aid". Results in terms of mean percent consistency were reported as follows: Session I Session II "Group A (Familiar) 39.0% 53.0(70 Group B (Non-familiar) 36.4% 54.0%" 1/ Corroborating evidence of a generally low rate of inter-indexer consistency is provided by noting instances of duplicated indexing that may occur in regularly issued announcement bulletins. During current awareness scanning of the DDC (ASTIA) "TAB" in recent months, members of the staff of the Research Information Center and Advisory Service on Information Processing have caught more than 20 cases of duplicate and even triplicate indexing of the same item. (Two examples can be discovered in Figure 8 a and b). For the 52 independent assignments involved, for these items the average inter- indexer consistency is only 46.1 percent. On the general subject of indexing consistency, Black comments as follows: "There have been enough experiments to indicate that there is no consistency, or very little, between one indexing performance by a given individual and another indexing performance, at a later date, by the same individual. The same inconsis- tency has been discovered among different individuals all indexing the same docu- ments. Thus there is neither inter-indexer consistency nor intra-indexer consis- tency in any system that depends on human performance." 2/ There can be little doubt that the quality and consistency of most human indexing, practically available today, is not good. Much of it, because of time and other pressures, is either directly a word-extraction process, or it is inconsistent in assignment of many relevant descriptors and subject category labels. On the other hand, today's indexing, whether accomplished by man or machine, is probably no better and no worse than any other classificatory or indexing procedures. The only excuse, therefore, for choice between man and machine is the cost/benefit ratio which is related on the one hand to specific operational considerations and on the other to the question of whether or not various indexers, and various users, would agree with the machine as much as they agree with each other. Before turning to some of the operational considerations affecting the cost-benefit ratio, however, certain special factors should be briefly mentioned. 7.4 Special Factors and Other Suggested Bases for Evaluation The difficulties and problems of evaluation so far considered are generally applicable to any indexing system, whether manual or automatic. Certain special factors arise, how- ever when we consider some of the proposed automatic assignment and automatic classi- fication techniques. In addition, the prospects for computer processing hold at least the 1/ Korotkin and Oliver, 1964 [331], p. 7. Black, 1963 [64], pp. 16-17. 160