MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Conclusion chapter Mary Elizabeth Stevens National Bureau of Standards third and fourth questions: whether machine-generated indexes are as good or better than the products 0£ human operations and of how we can measure and appraise the adequacy of any indexing system whatever. Here are encountered the "core" problems of meaning in communication, of information loss in any reductive transformation of actual messages or documents, of relevance of particular messages to particular queries and to particular human needs, of judgments of relevance. Because of these underlying yet overriding questions, the state-of-the-art in the evaluation of indexing systems is in fact far more primitive than that of automatic indexing itself. An easy, and an early, solution is not likely. Therefore, today, in appraising machine potentials for assignment indexing we are faced with what is in effect a single criterion: namely, will a given group of human evaluators, whatever their standards and requirements, agree as much with the products of an automatic indexing procedure, other- wise competitive on a cost-benefit ratio with human indexing of the same material, as they do amongst themselves? Within the limits of small, specially selected samples of document or message col- lections, it is possible to demonstrate that: (1) Replication of the products of at least some existing systems, within the consistency levels observed for these systems, can be achieved. (Z) Retrieval effectiveness with respect to relevant items indexed by auto- matic assignment procedures can be at least as good as, and may be superior to, that obtained from run-of-the-mill manual indexing of the same items. (3) Costs of indexing can be held at or below the costs of equivalent manual indexing, provided both that the input material required is already in machine-usable form, or can be held to an average of, say, 100 words or less, and that the clue-word lists, association factors, or probabilistic calculations can be accommodated within internal memory. (4) Significant gains in time required to generate an index or to index or re- index a collection can be achieved. Some degree of theoretical success in assignment indexing by machine can thus certainly be claimed. Moreover, many of the test results reported do clearly indicate a quality of indexing, for a given collection at a given level of specificity of indexing, at least com- parable to that which is typically and routinely achieved by people in a practical indexing situation. No more should be asked of the automatic techniques unless better human index- ing can be specified as being equally feasible, timely, and practical. Further, no more should be asked of automatic techniques in terms of the evaluation of their potentialities, than is now asked of the manually-prepared alternatives. 1/ Data with respect to comparison of the results of automatic assignment indexing techniques to either a priori or a posteriori human judgment have been mentioned previous- ly in this report in terms of actual test results reported, and the most significant of these reported data are summarized in Table z. z/ Typically, however, these data reflect, in varying degrees, so small a sample of test cases, of user preferences, and/or of special purpose and interest, that no general extropolation is reasonable. Moreover, the general questions of the "core" problems of evaluation in general again rear their own ugly heads. 1/ Compare, for example, Kennedy, 196Z [311] and Needham, 1963 [433]. 2/ See pp. 101-103 of this report. 177