MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Problems of Evaluation chapter Mary Elizabeth Stevens National Bureau of Standards Other typical points made by O'Connor include the possibilities that the use of automatic indexing techniques might free trained technical people for other work, that it might permit more indexing than is now possible with available resources, that it might cost less, and that it might produce a better or more consistent indexing product 1/ With respect to the latter point, however, he points out that greater consistency might not in itself be a virtue, since the product although generated more consist ently might be relatively worthless by comparison with the inconsistent human product. [OCRerr]2i Especially pertinent to the question of judgment factors in evaluation was a comparison of the most frequent words selected by the Luhn "auto-encoding11 technique as applied to an ICSI pape against a quasi-random word list for the same paper produced by selecting the last non- common word on every page, and the first such word on every second page. He remarks: "The important point of this quasi-random list for my present purposes is to emphasize that first impressions might not be at all a good way of judging the adequacy of an index set." 7.2. 3 Questions of Comparative Costs The paucity of objective data on the effectiveness of indexing systems generally extends to even such obvious questions as costs of indexing and time required to index. These very questions might, in fact, be decisive with respect to choice between manual and machine systems. It has been estimated by some that the costs of manual subject indexing amount to close to 75 percent of the costs of operatingan information selection and retrieval system, !I yet very little actual data on costs has been reported in the literature. 5/ Exception5 are, for the most part, limited to rather special cases, such as the following examples: 1. A total cost of less than $30,000 is reported for a 10, 000 document collection at Aeronutronic. Four man-years of effort were required. On average, 12.6 access points were provided per document, of which 9.2 were subject-indicating descriptors chosen, with some modifications, from the second Edition of the ASTIA Thesaurus. "This favorable figure was possible because an adequate ready-made thesaurus of indexing terms was available and because the `p eek- a-boo' type equipment used was much 1/ 2/ 3' 4/ 5/ O'Conn[OCRerr]r, 1962 £4473, p. 267. O'Connor, 1963 [443j, p. [OCRerr]6. O'Connor, 1962 L447j, p.270. O'Connor, 1963 [OCRerr]442], p. 1. See, for example, A. D. Little, Inc. (1963 [OCRerr]233, p.5): "Performance and cost data on existing large documentation systems are surprisingly sparse, and cost data have rarely included adequate overhead and depreciation accounting." 153