MONO91 NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report Problems of Evaluation chapter Mary Elizabeth Stevens National Bureau of Standards promise of more objective measures of performance or quality than evaluative techniques available today. Examples of the special factors involved in assignment indexing techniques and automatic classification include the qaestion of the amount of computation required in the inversion and other manipulations of large matrices 1/ and the concommitant problems of how large a vocabulary of clue words can be used effectively and of whether some docu- ments cannot be indexed at all because they contain none of these words. z/ There is, as Needham says, "no merit in a classification program which can only be applied to a couple of hundred objects." 3/ In the various techniques for automatic clustering or categorization of documents, there are serious questions of whether the groupings can be conveniently named or dis- played for the benefit of the user. 4/ Another example of special factors in the appraisal of an automatically generated classification scheme is as follows: "Operational testing is displeasing in that it puts off any verification until right at the end; it is expensive; there is not much experience on how to do it in a realistic way; and it is ill-controlled in the sense that the practical performance of a system is influenced by many other factors than the classification it embodies." 5/ Examples of suggested bases for evaluation made possible by machine processing itself include proposals by Doyle and Garvin, among others. Doyle in particular suggests the substitution for the elusive concept of "relevance" of criteria based on "sharpness of separation of exploratory regions in which the searcher finds documents of interest from those in which he does not find such documents." 6/ He further emphasizes the need for discriminating a particular document from other topically close documents (Doyle, 1961 [166]) and suggests that "this decision can never be made by a human-- -only by a com- puter, which is the only agency capable of having full consciousness of the contents of a library. `[OCRerr] 7/ Garvin considers the more general problems of language and meaning, and suggests that there are two kinds of "observable and operationally tractable manifestations of linguistic meaning", - - -namely, translation and paraphrase, and that these may be investigated by techniques of linguistic data processing. 8/ Edmundson, however, points out that while there is in general only one translation of a document, there may be as many abstracts (and, by implication, index sets) as there are users. 9/ Thus we are back again at the questions of purpose and relevance. 1/ Compare Williams, 1963 [642], p. 162. See Maron and Borko, various references. 3/ 4/ Needham, 1963 [433], p. 8. See, for example, Doyle, 1963 [162], p. 6: "Several researchers have tried to group topically close articles, usually by statistical means, but it is rather difficult to get any benefit from this grouping unless you can represent these groups for human inspection." s/ Needham, 1963 [432], p. 2. 6/ 7/ 8/ 9/ Doyle, 1963 [164], p. 200. Doyle, 1961 [169], p. 23. Garvin, 1961 [224], p. 137. Edmundson, 1962 [178], p. 4. 163