MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Problems of Evaluation
chapter
Mary Elizabeth Stevens
National Bureau of Standards
7.2 Bases and Criteria for Evaluation of Automatic Indexing Procedures
What should the bases be for the evaluation of existing or proposed indexing systems
that rely, to a greater or lesser extent, on machine generation of the indexing or classi-
ficatory labels? Since the evaluation of quality of indexing per se raises such fundamental
and elusive questions, can these questions be begged for the case of automatic indexing as
they are in fact for almost all manual systems? If so, the obvious bases are those of
time, cost, availability of alternative possibilities, and customer acceptance. Here again
we are faced with a dearth of objective data, even for the intercomparison of any two
manual systems.
In the two years preceding the ICSI Conference, the Program Committee openly
solicited papers that would provide comparative data for operating information systems
and that would develop and discuss criteria for the comparison of systems. [OCRerr]l/ Never-
theless, of the papers received only two were responsive to this invitation: the special
case of comparing the conventional file against the inverted file approach to the searching
of chemical structure data (Miller et al, 1959 [4l9[OCRerr]), and an early report by Cleverdon
on the ASLIB Cranfield project for the intercomparison of indexing systems, under a
grant from the National Science Foundation (1959 [126]).
There had been an earlier comparative experiment, generally conceded to be the
first of its kind 2/in which 98 search requests were run by ASTIA personnel using a
conventional catalog and by personnel of Documentation Inc. , using a coordinated uniterm
index. Warheit says:
"Unfortunately, the conditions of the test were very poorly designed so that,
in the final analysis, each group was the sole judge both of the scope of the
original request and of the adequacy of the bibli,o,[OCRerr]r,aPhies produced. The
resulting claims are of course contradictory.
1/
2I
3/
See "Proposed Scope of Area 4,'! Proceedings, ICSI, 1959 [481], pp. 665-669.
Compare, for example, Gull, 1956 [246[OCRerr], p. 329: "When one considers that a
fairly thorough search of the literature indicates that this comparison of two
reference systems is the first undertaken so far, it is not surprising that the
results reveal clerical errors and an incomplete design of the test."
Warheit, 1956[631], p. 274.
149