MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Problems of Evaluation
chapter
Mary Elizabeth Stevens
National Bureau of Standards
1/
rather better than reported by others - and have been subjected to specific criticisms
although these first tests were limited to the recall of the source documents on which
the test questions were based. For non-source documents there would of course also
be questions relating to the core problem of how relevance is to be judged. Thus Markus
says:
"Despite investigations by Cleverdon in England, and by many others, there is
today no generally accepted method of comparing the effectiveness of different
types of indexes. The needs of index users vary so greatly that even the most
carefully planned tests of retrieval efficiency can be challenged." 2/
Notwithstanding such criticisms, however, and in spite of the fact that the Cranfield
tests have so far been directed principally to indexing systems applied manually, certain
findings and conclusions reached by Cleverdon and his associates are pertinent to the
questions of evaluating automatic indexing procedures. Examples are:
"The fact is that no indexing sleight of hand, no indexing skill, can produce a
system in which a figure for recall can be improved substantially without
weakening the over-all relevance, i.e., the number of documents that are
really relevant compared with the total number retrieved.
"The majority of the failures (60 percent) were due to inadequacies and in-
accuracies (carelessness rather than lack of knowledge) in the indexing process.
However, supplementary tests, in which the staff of outside organizations carried
out the indexing revealed that the Cranfield indexers were achieving a standard
above average. This seems to indicate a certain inevitability of human weakness
and error in the indexing process and lends some support to the many current
3/
research projects that are investigating the feasibility of automatic indexing. -
7.2.2 O'Connor's Investigations
As O'Connor has cogently observed on a number of occasions, the question of
whether or not automatic indexing is possible is not the real question. Rather, the
problem is whether or not indexing by machine is capable of producing results that are
"good enough" for retrieval purposes, raising in its turn the still more basic question of
how "good retrieval" can be evaluated. His own approach in detailed investigations has
1/
See, for example, Johnson 1962 [300], p. 90: "The amount of meaningful
information that can be retrieved is too small. There are few available studies
on this subject. But these seem to indicate that, under some indexing schemes,
meaningful retrieval can run as low as 10 and 15 percent and that the most that
can be optimized for any of them, even under highly motivated conditions, is
around 70 percent."
2/
3/
Markus, 1963 [394], p. 16. See also Kochen, 1963 [327], p.
standing large-scale and realistic experimental work is that of
Unfortunately, his results are not very decisive."
Cleverdon et al, 1964[130], pp. 86-87.
151
12: "The out-
Cleverdon.