MONO91
NIST Monograph 91: Automatic Indexing: A State-of-the-Art Report
Problems of Evaluation
chapter
Mary Elizabeth Stevens
National Bureau of Standards
Similarly, Baxendale states:
11We are confronted with difficulties which arise from the multiple ways in
which words and sentences are put together to convey meanings and shades
of meaning -- i.e., to represent ideas and concepts. Research into this
problem - - drawing upon psychological and logical analysis -- is scarcely
begun." 1/
A third core problem is the proper choice of appropriate selection criteria if
condensed representations of document content must be used for scanning, search, and
relevance decisions. Swanson suggests that the price paid for brevity of representation
so that searching operations can be efficiently managed is the loss of at least some,
perhaps most, of the information in a collection or library. He notes also that:
"It is another obvious but seldom remarked fact that the extent of such
information loss for existing libraries is not only unknown [OCRerr]t[OCRerr]ut has never
defined in measurable terms."
This loss is lived with, today, in many practical situations involving abstracts, index
term sets, selective-dissemination notices, and even mere author-title listings in
announcement bulletins or search output products from either manual or machine
searches. Yet the sheer increase in volume of the total number of items to be covered
and of the number of items potentially responsive even to a single individual's interests
has severely stretched any individual's capacity to scan or skim, much less read, the
presumably pertinent material -- documents themselves, abstracts of other documents,
listings of documents available - - already accumulating on his desk.
Condensation, reductive representation, becomes more and more imperative.
Concurrently, while conventional tools may be lived with, after a fashion, the sub -
stitution of machine-compiled or machine-produced alternatives, even though they give
the same information in the same volume, number of pages to be scanned, may because
of such things as inferiorities of page and line formatting, size of type on the page,
limitation of typography to upper case and a few other symbols, make the problem of how
adequate the user judges the selection and condensation to be, that much worse.
A fourth problem in evaluation, therefore, is the question of whether or not the
benefit to users is worth the cost. For example, despite the arguments for concept
rather than word indexing, for assignment of labels rather than mere extraction of a few
words used by the author himself, at least some data on the use made by scientists of
various sources of information on material which might be of interest to them suggests
1/
2/
Baxendale, 1962 [42], p. 68.
Swanson, 1960 [582], pp. 5-6.
146