IRE
Information Retrieval Experiment
Evaluation within the enviornment of an operating information service
chapter
F. Wilfrid Lancaster
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Some problems of evaluation applied to operating Systems 117
Designing the evaluation
The second step of the evaluation involves the preparation of a plan of action
that allows the gathering of data needed to answer the questions posed in the
definition of scope. The designer of the study must identify what data are
needed to answer each question and what procedures could be used to gather
the data in the most efficient and expedient way. For each question, the
evaluator must decide whether (1) it can be answered simply by collecting
data from the system as it presently exists or (2) some changes in the normal
functioning of the system must be made in order to collect the necessary data.
For example, the question `What is the present response time of the system,
expressed in ranges, means, medians, and modes?' can be answered from the
system as it is now. It requires only the collection of data on the date and time
a request is received and the date and time the results are submitted to the
requester, for a representative sample of transactions. To answer a question
of this kind, new records may need to be created for the purpose of the study,
but, apart from record keeping, the existing system is not perturbed in any
way. In contrast, consider the question `What would be the effect.on response
time if action X were carried out?' This implies a change in the present
system, and the question can be answered only by deliberately applying
action X to a representative sample of transactions and comparing the
response times with those of the system as it normally functions.
In some cases, then, the evaluator is primarily concerned with systematic
and controlled observation of the system. In other cases, however, he needs
to go beyond simple observation of this kind and into the field of experimental
design. In the evaluation programme it is important that well-established
procedures of experimental design be followed and that appropriate statistical
techniques be applied to the analysis and interpretation of the results.
Execution of the programme
The third step, execution of the evaluation, is the stage at which the data are
gathered once the evaluation design has been agreed on by all the parties
concerned. This stage is likely to be the longest in terms of elapsed time. It
may also be the stage in which the evaluator is least directly involved and
perhaps the stage over which he has the least direct control. Although the
execution stage can hardly begin before the design stage is completed, the
analysis and interpretation stage should certainly begin before the execution
stage is concluded; that is, the evaluator must ensure that he receives data
continuously from the beginning of the execution stagc, so that they can be
reduced to a form suitable for analysis and interpretation. It should be fairly
obvious what is involved in the analysis and interpretation stage of an
evaluation project. Here the evaluator is concerned with reducing the data
and manipulating it in such a way that it can answer, or at least contribute to
answering, the questions posed in the work statement. It is not possible to
present any precise guidelines for analysis and interpretation because they
vary considerably from one evaluation application to another. In the case of
the evaluation of an information retrieval system, this stage of the study is
mainly concerned with the derivation and manipulation of performance
results[OCRerr]for example, recall and precision ratios[OCRerr]and with the analysis of