IRE
Information Retrieval Experiment
Evaluation within the enviornment of an operating information service
chapter
F. Wilfrid Lancaster
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
108 Evaluation within the environment of an operating information service
`given') and focus instead on one of the other levels. Most of the evaluations
of operating systems have, in fact, been restricted to evaluations of their
effectiveness (e.g. in terms of the number of users who express subjective
satisfaction or the number of actual demands that are satisfied according to
some more objective criteria). Few detailed cost analyses have been
conducted or, at least, few are reported in the literature. And realistic cost-
effectiveness analyses are even more scarce. This is a pity because it can be
argued that a study of effectiveness has little real meaning unless related to
costs and that, certainly, a cost analysis has little real value unless related to
level of effectiveness. Managers of information services are, or should be,
concerned with optimum allocation of the resources available (i.e. one that
achieves the maximum quality of service possible within budgetary
constraints) and optimum resource allocation is only likely to come from a
true cost-effectiveness analysis.
A useful distinction, first made by King and Bryant1 3, is that between
macroevaluation and microevaluation. A macroevaluation of a system is one
that measures its present level of performance (e.g. in terms of recall and
precision or as a document delivery score) and is content to let the study rest
there. A macroevaluation, then, merely establishes a benchmark. But a
microevaluation goes much beyond this. It seeks to answer such questions as
`Why is the system operating at this level?', `Under what conditions does the
system perform well and under what conditions does it perform badly?', and
`What can be done to raise the level of performance in the future?' A
microevaluation, then, is diagnostic while a macroevaluation is not.
Another possibly useful distinction in the information services environment
is that between inputs, outputs and outcomes. Again, this is a sequence of
increasing complexity. Inputs to an information service are the easiest things
to measure. They can be expressed in purely quantitative terms: how many
documents, how many people, how much money? Outputs are more difficult
to deal with because output measures must take into account quality as well
as quantity. For example, in the evaluation of a question-answering service
the appropriate output measure is not the number of questions submitted. It
is not even the proportion of questions for which an answer is supplied. It is
the proportion of questions submitted for which a complete and correct
answer is supplied. The outcome of an information service is the most
difficult aspect to study for the notion of outcome brings us back to that of
impact, effect or benefit. It is more difficult to evaluate outcomes than it is to
evaluate outputs and it is more difficult to evaluate outputs than it is to
quantify inputs. All types of information services will probably have reliable
input data but few have meaningful qualitative output data and data on
outcomes are likely to be non-existent. Where they exist in the information
services field (e.g. applied to various types of libraries), standards tend to be
entirely related to inputs. This is not because inputs are most important (far
from it) but merely because inputs are easiest to look at, quantify and reduce
to `standard' form.
In the evaluation of an operating information service we should primarily
be interested in its outcomes. After all, it is the beneficial outcomes that
presumably justify the existence of the service. But it may not be possible to
evaluate outcomes; or, at least, the evaluation of outcomes may be so complex
as to discourage the attempt. On the other hand it should be possible to
j