MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Problems of Evaluation
chapter
Mary Elizabeth Stevens
National Bureau of Standards
Other typical points made by O'Connor include the possibilities that the use of
automatic indexing techniques might free trained technical people for other work, that it
might permit more indexing than is now possible with available resources, that it might
cost less, and that it might produce a better or more consistent indexing product 1/
With respect to the latter point, however, he points out that greater consistency might not
in itself be a virtue, since the product although generated more consist ently might be
relatively worthless by comparison with the inconsistent human product. [OCRerr]2i Especially
pertinent to the question of judgment factors in evaluation was a comparison of the most
frequent words selected by the Luhn "auto-encoding11 technique as applied to an ICSI pape
against a quasi-random word list for the same paper produced by selecting the last non-
common word on every page, and the first such word on every second page. He remarks:
"The important point of this quasi-random list for my present purposes is to
emphasize that first impressions might not be at all a good way of judging
the adequacy of an index set."
7.2. 3 Questions of Comparative Costs
The paucity of objective data on the effectiveness of indexing systems generally
extends to even such obvious questions as costs of indexing and time required to index.
These very questions might, in fact, be decisive with respect to choice between manual
and machine systems. It has been estimated by some that the costs of manual subject
indexing amount to close to 75 percent of the costs of operatingan information selection
and retrieval system, !I yet very little actual data on costs has been reported in the
literature. 5/ Exception5 are, for the most part, limited to rather special cases, such
as the following examples:
1. A total cost of less than $30,000 is reported for a 10, 000 document
collection at Aeronutronic. Four man-years of effort were required.
On average, 12.6 access points were provided per document, of which
9.2 were subject-indicating descriptors chosen, with some modifications,
from the second Edition of the ASTIA Thesaurus. "This favorable figure
was possible because an adequate ready-made thesaurus of indexing terms
was available and because the `p eek- a-boo' type equipment used was much
1/
2/
3'
4/
5/
O'Conn[OCRerr]r, 1962 £4473, p. 267.
O'Connor, 1963 [443j, p. [OCRerr]6.
O'Connor, 1962 L447j, p.270.
O'Connor, 1963 [OCRerr]442], p. 1.
See, for example, A. D. Little, Inc. (1963 [OCRerr]233, p.5): "Performance and cost data
on existing large documentation systems are surprisingly sparse, and cost data
have rarely included adequate overhead and depreciation accounting."
153