IRE
Information Retrieval Experiment
Evaluation within the enviornment of an operating information service
chapter
F. Wilfrid Lancaster
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Some problems of evaluation applied to operating Systems 115
unequivocal factual answers. The answer to such a question as `What is the
melting point of. .?` is either supplied completely and correctly or it is not.
Question-answering services, whether the answer is supplied from a printed
source or a machine-readable data bank, must therefore be evaluated in
terms of the completeness and accuracy of the data supplied.
6.3 Some problems of evaluation applied to operating systems
In a book on experimentation in information retrieval a chapter on the
evaluation of operating systems may be considered something of an
interloper. Evaluation does not necessarily imply experimentation. In fact,
the evaluation of an operating system will not usually involve any
experimentation. It will merely be an analysis of the performance of the
system at a particular point in time. This does not mean that experimentation
is impossible within an operating environment. It is possible, but it might be
quite difficult. Although most are not framed in this way, it is possible for an
evaluation of an operating system to take the shape of a formal research
project with a hypothesis that the investigators set out to test. An example of
such a hypothesis might be `Literature searches conducted for requestors
who visit the centre in person produce better results in terms of recall and
precision than those conducted for users who submit their requests by mail
to the centre'. It is possible to build an evaluation upon a research hypothesis
of this type. But, unlike the true experimental situation, in an operating
environment it may be quite difficult to control all the independent variables
that may affect the results.
This is not to imply that controls should not be sought. Variables extraneous
to the focus of the study should be controlled as much as possible. In the
evaluation of an operating system, established principles of experimental
design, sampling, survey methodology, and other methodological issues, are
as relevant and important as they are in any other evaluation situation. It
must be recognized, however, that, when dealing with a real-life environment,
some methodological compromises may need to be made.
While the evaluation may not be based on a formal research hypothesis, it
should certainly have some clearly defined evaluation objectives.
The major steps involved in the conduct of an evaluation programme are
the following:
(1) Defining the scope of the evaluation.
(2) Designing the evaluation programme.
(3) Execution of the evaluation.
(4) Analysis and interpretation of the results.
(5) Modifying the system or service on the basis of the evaluation results.
Definition of scope
The first step, the definition ofscope, entails the preparation of a precise set of
questions that the evaluation must be designed to answer. The purpose of an
evaluation is to learn more about the capabilities and weaknesses of a system