MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Problems of Evaluation
chapter
Mary Elizabeth Stevens
National Bureau of Standards
promise of more objective measures of performance or quality than evaluative techniques
available today.
Examples of the special factors involved in assignment indexing techniques and
automatic classification include the qaestion of the amount of computation required in the
inversion and other manipulations of large matrices 1/ and the concommitant problems of
how large a vocabulary of clue words can be used effectively and of whether some docu-
ments cannot be indexed at all because they contain none of these words. z/ There is, as
Needham says, "no merit in a classification program which can only be applied to a couple
of hundred objects." 3/
In the various techniques for automatic clustering or categorization of documents,
there are serious questions of whether the groupings can be conveniently named or dis-
played for the benefit of the user. 4/ Another example of special factors in the appraisal
of an automatically generated classification scheme is as follows:
"Operational testing is displeasing in that it puts off any verification until right at the
end; it is expensive; there is not much experience on how to do it in a realistic way;
and it is ill-controlled in the sense that the practical performance of a system is
influenced by many other factors than the classification it embodies." 5/
Examples of suggested bases for evaluation made possible by machine processing
itself include proposals by Doyle and Garvin, among others. Doyle in particular suggests
the substitution for the elusive concept of "relevance" of criteria based on "sharpness of
separation of exploratory regions in which the searcher finds documents of interest from
those in which he does not find such documents." 6/ He further emphasizes the need for
discriminating a particular document from other topically close documents (Doyle, 1961
[166]) and suggests that "this decision can never be made by a human-- -only by a com-
puter, which is the only agency capable of having full consciousness of the contents of a
library. `[OCRerr] 7/ Garvin considers the more general problems of language and meaning, and
suggests that there are two kinds of "observable and operationally tractable manifestations
of linguistic meaning", - - -namely, translation and paraphrase, and that these may be
investigated by techniques of linguistic data processing. 8/ Edmundson, however, points
out that while there is in general only one translation of a document, there may be as many
abstracts (and, by implication, index sets) as there are users. 9/ Thus we are back again
at the questions of purpose and relevance.
1/ Compare Williams, 1963 [642], p. 162.
See Maron and Borko, various references.
3/
4/
Needham, 1963 [433], p. 8.
See, for example, Doyle, 1963 [162], p. 6: "Several researchers have tried to
group topically close articles, usually by statistical means, but it is rather difficult
to get any benefit from this grouping unless you can represent these groups for
human inspection."
s/ Needham, 1963 [432], p. 2.
6/
7/
8/
9/
Doyle, 1963 [164], p. 200.
Doyle, 1961 [169], p. 23.
Garvin, 1961 [224], p. 137.
Edmundson, 1962 [178], p. 4.
163