MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Problems of Evaluation
chapter
Mary Elizabeth Stevens
National Bureau of Standards
However, some of the findings are pertinent to our present questions of evaluation.
Thus, of 492 items selected by Documentation, Inc. , that ASTIA considered pertinent but
had not selected, 98 were missed by them although the proper subject heading was
searched and the catalog card had adequate selection clues, 89 were missed because not
all applicable subject headings were searched, 21 were missed because the original
subject heading assignments had been inadequate, 7 were missed because neither title nor
abstract provided indication that the report itself was pertinent to the request, and 102
were missed "because the subject heading did not occur to the searcher or because there
were so many cards under the subject heading that the searcher was discouraged" 1/
Similarly, Gull reports, of 318 items selected by ASTIA that Documentation, Inc.
personnel considered relevant but had not themselves selected, 97 were missed because
the searcher did not consult the proper terms.
7.2.1 The Cranfield Project
The inauguration of the Cranfield project is itself indicative of a prior lack of
objective standards as applied to the measurement of effectiveness of information
indexing, selection and retrieval systems. 2/ Beginning in 1957, and still continuing with
respect to individual indexing devices such as synonym controls and role indicators, this
work has attempted to compare different indexing systems (e.g. , UDC, Uniterm, etc.)
under different indexing conditions (e.g., type of training of indexer, length of time
allowed to index) against proposed measures of 1'retrieval effectiveness". These
measures are, respectively, the recall ratio, or the percentage of relevant documents
retrieved as against the total number of relevant documents known to be in the collection,
and the relevance ratio, or the percentage of relevant documents among those actually
retrieved.
In the first Cranfield tests, on 18, 000 documents, it is reported that the recall ratio
ranged between 75 and 85 percent for all four indexing systems. 3/ These results are
1/
2/
Gull, 1956 E2461, p. 329.
Compare, for example, Randall, 1962 E492], pp. 380-381: "Prior to 1957, the
proponents of the various indexing and classification schemes, the universal
decimal system, the alphabetic subject heading, the Uniterm system and faceted
classification touted their own system on the bases of subjective evaluation and
theoretical investigations. There were many claims and much supposition about
the relative merits and benefits . .. but there was no body of data from which an
objective evaluation could be made. . . Many observers believe that the Cranfield
study constitutes the most important work done in the field of cataloging in
recent times."
3/
Cleverdon, et al, 1964 [l30[OCRerr], p. 87.
150