MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Problems of Evaluation
chapter
Mary Elizabeth Stevens
National Bureau of Standards
Ilif the answer turns out to be `no!, we might reasonably conclude that the only
reliable and effective kind of human indexing is that which is already machine-
like in nature.!! 1[OCRerr]
With a few noteworthy exceptions, there has been very little serious investigation of these
problems and there is very little comparative data.
O1Connor has been making a series of studies, with considerable emphasis upon how
one might measure the products of machine indexing and how one might derive machine
rules for automatic index mg from systematic review of documents indexed by people.
Cleverdon and his associates at the ASLIB Cranfield project have extensively tested
several different indexing procedures. Painter, MacMillan and Welt, Slamecka and
Zunde, and others report findings on intra-indexer, and inter-indexer consistency - -
unfortunately, on the basis of quite small samples. Various alternate approaches to the
evaluation of automatic indexing results have been considered by Borko, Doyle, Swanson,
Savage, Giuliano, and others. In addition, some data bearing on these questions have
been reported in connection with analyses of selective dissemination (SDI) systems.
Some data from other sources, such as studies of user preferences with respect to
[OCRerr]rious reference and search tools, is also pertinent.
The most generally accepted criterion for appraising the effectiveness 0 f indexing
is that of retrieval effectiveness. But, in general, this is merely the substitution of
one intangible for another, entailing a string of as yet unanswerable or at least un-
resolverA questions.21 Retrieval of what, for whom, and when? How can effectiveness be
measured except by the elusive question of relevance judgments? How can human judg'-
ments of relevance and value be measured and quantified?
We shall try to distinguish here, insofar as possible, between the core problems
that make the evaluation of indexing as such an extremely difficult task, the available
data on human indexer reliability, and the possible advantages and disadvantages of
automatic indexing techniques.
1/
2/
Montgomery and Swanson, 1962 [421), p. 366.
Compare Swanson, 1960 [582], pp. 2-3: `The performance of retrieval experi-
ments when relevance judgments per se cannot be consistently assessed by human
judgment would seem to represent overly vigorous pursuit of a solution before
identifying the problem." Similarly, see Black, 1963 [64], p. 14: !!Finally,
when one is faced with an existing collection of indexed materials, how does one
assess the effectiveness of any retrieval system? Suppose that one receives 20
documents as a result of a query to the system. Suppose further that all 20 docu-
ments are quite pertinc'nt to the topic of interest. Is there any way to assess the
amount of pertinent information still unretrieved from the file? Or is there any
way of learning whether the retrieved information is more pcrtin(nt than the un-
retrieved information ? `1'h[OCRerr] answer is `No! ` -- the use of any retrieval system
is, then, an act of faith in the quality of indexing.!!
144