MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Conclusion
chapter
Mary Elizabeth Stevens
National Bureau of Standards
third and fourth questions: whether machine-generated indexes are as good or better than
the products 0£ human operations and of how we can measure and appraise the adequacy of
any indexing system whatever. Here are encountered the "core" problems of meaning in
communication, of information loss in any reductive transformation of actual messages or
documents, of relevance of particular messages to particular queries and to particular
human needs, of judgments of relevance.
Because of these underlying yet overriding questions, the state-of-the-art in the
evaluation of indexing systems is in fact far more primitive than that of automatic indexing
itself. An easy, and an early, solution is not likely. Therefore, today, in appraising
machine potentials for assignment indexing we are faced with what is in effect a single
criterion: namely, will a given group of human evaluators, whatever their standards and
requirements, agree as much with the products of an automatic indexing procedure, other-
wise competitive on a cost-benefit ratio with human indexing of the same material, as they
do amongst themselves?
Within the limits of small, specially selected samples of document or message col-
lections, it is possible to demonstrate that:
(1) Replication of the products of at least some existing systems, within the
consistency levels observed for these systems, can be achieved.
(Z) Retrieval effectiveness with respect to relevant items indexed by auto-
matic assignment procedures can be at least as good as, and may be
superior to, that obtained from run-of-the-mill manual indexing of the
same items.
(3) Costs of indexing can be held at or below the costs of equivalent manual
indexing, provided both that the input material required is already in
machine-usable form, or can be held to an average of, say, 100 words or
less, and that the clue-word lists, association factors, or probabilistic
calculations can be accommodated within internal memory.
(4) Significant gains in time required to generate an index or to index or re-
index a collection can be achieved.
Some degree of theoretical success in assignment indexing by machine can thus certainly
be claimed. Moreover, many of the test results reported do clearly indicate a quality of
indexing, for a given collection at a given level of specificity of indexing, at least com-
parable to that which is typically and routinely achieved by people in a practical indexing
situation. No more should be asked of the automatic techniques unless better human index-
ing can be specified as being equally feasible, timely, and practical. Further, no more
should be asked of automatic techniques in terms of the evaluation of their potentialities,
than is now asked of the manually-prepared alternatives. 1/
Data with respect to comparison of the results of automatic assignment indexing
techniques to either a priori or a posteriori human judgment have been mentioned previous-
ly in this report in terms of actual test results reported, and the most significant of these
reported data are summarized in Table z. z/ Typically, however, these data reflect, in
varying degrees, so small a sample of test cases, of user preferences, and/or of special
purpose and interest, that no general extropolation is reasonable. Moreover, the general
questions of the "core" problems of evaluation in general again rear their own ugly heads.
1/ Compare, for example, Kennedy, 196Z [311] and Needham, 1963 [433].
2/ See pp. 101-103 of this report.
177