MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Problems of Evaluation
chapter
Mary Elizabeth Stevens
National Bureau of Standards
Still further studies of indexer consistency investigated at the Information Systems
Operation division of General Electric have just recently been reported (Korotkin and
Oliver, 1964 [331, 33Z]). In particular, the investigators report on the effects of subject
matter familiarity and on the use as a job aid of a reference list of suggested descriptors
upon inter-indexer consistency. The material for test consisted of 30 abstracts drawn
from Psychological Abstracts, to be indexed by 5 psychologists and 5 non-psychologists in
two sessions, with and without use of the "job aid". Results in terms of mean percent
consistency were reported as follows:
Session I Session II
"Group A (Familiar) 39.0% 53.0(70
Group B (Non-familiar) 36.4% 54.0%" 1/
Corroborating evidence of a generally low rate of inter-indexer consistency is
provided by noting instances of duplicated indexing that may occur in regularly issued
announcement bulletins. During current awareness scanning of the DDC (ASTIA) "TAB"
in recent months, members of the staff of the Research Information Center and Advisory
Service on Information Processing have caught more than 20 cases of duplicate and even
triplicate indexing of the same item. (Two examples can be discovered in Figure 8 a and
b). For the 52 independent assignments involved, for these items the average inter-
indexer consistency is only 46.1 percent.
On the general subject of indexing consistency, Black comments as follows:
"There have been enough experiments to indicate that there is no consistency, or
very little, between one indexing performance by a given individual and another
indexing performance, at a later date, by the same individual. The same inconsis-
tency has been discovered among different individuals all indexing the same docu-
ments. Thus there is neither inter-indexer consistency nor intra-indexer consis-
tency in any system that depends on human performance." 2/
There can be little doubt that the quality and consistency of most human indexing,
practically available today, is not good. Much of it, because of time and other pressures,
is either directly a word-extraction process, or it is inconsistent in assignment of many
relevant descriptors and subject category labels. On the other hand, today's indexing,
whether accomplished by man or machine, is probably no better and no worse than any
other classificatory or indexing procedures. The only excuse, therefore, for choice
between man and machine is the cost/benefit ratio which is related on the one hand to
specific operational considerations and on the other to the question of whether or not
various indexers, and various users, would agree with the machine as much as they agree
with each other.
Before turning to some of the operational considerations affecting the cost-benefit
ratio, however, certain special factors should be briefly mentioned.
7.4 Special Factors and Other Suggested Bases for Evaluation
The difficulties and problems of evaluation so far considered are generally applicable
to any indexing system, whether manual or automatic. Certain special factors arise, how-
ever when we consider some of the proposed automatic assignment and automatic classi-
fication techniques. In addition, the prospects for computer processing hold at least the
1/ Korotkin and Oliver, 1964 [331], p. 7.
Black, 1963 [64], pp. 16-17.
160