IRE Information Retrieval Experiment Evaluation within the enviornment of an operating information service chapter F. Wilfrid Lancaster Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Some problems of evaluation applied to operating Systems 117 Designing the evaluation The second step of the evaluation involves the preparation of a plan of action that allows the gathering of data needed to answer the questions posed in the definition of scope. The designer of the study must identify what data are needed to answer each question and what procedures could be used to gather the data in the most efficient and expedient way. For each question, the evaluator must decide whether (1) it can be answered simply by collecting data from the system as it presently exists or (2) some changes in the normal functioning of the system must be made in order to collect the necessary data. For example, the question `What is the present response time of the system, expressed in ranges, means, medians, and modes?' can be answered from the system as it is now. It requires only the collection of data on the date and time a request is received and the date and time the results are submitted to the requester, for a representative sample of transactions. To answer a question of this kind, new records may need to be created for the purpose of the study, but, apart from record keeping, the existing system is not perturbed in any way. In contrast, consider the question `What would be the effect.on response time if action X were carried out?' This implies a change in the present system, and the question can be answered only by deliberately applying action X to a representative sample of transactions and comparing the response times with those of the system as it normally functions. In some cases, then, the evaluator is primarily concerned with systematic and controlled observation of the system. In other cases, however, he needs to go beyond simple observation of this kind and into the field of experimental design. In the evaluation programme it is important that well-established procedures of experimental design be followed and that appropriate statistical techniques be applied to the analysis and interpretation of the results. Execution of the programme The third step, execution of the evaluation, is the stage at which the data are gathered once the evaluation design has been agreed on by all the parties concerned. This stage is likely to be the longest in terms of elapsed time. It may also be the stage in which the evaluator is least directly involved and perhaps the stage over which he has the least direct control. Although the execution stage can hardly begin before the design stage is completed, the analysis and interpretation stage should certainly begin before the execution stage is concluded; that is, the evaluator must ensure that he receives data continuously from the beginning of the execution stagc, so that they can be reduced to a form suitable for analysis and interpretation. It should be fairly obvious what is involved in the analysis and interpretation stage of an evaluation project. Here the evaluator is concerned with reducing the data and manipulating it in such a way that it can answer, or at least contribute to answering, the questions posed in the work statement. It is not possible to present any precise guidelines for analysis and interpretation because they vary considerably from one evaluation application to another. In the case of the evaluation of an information retrieval system, this stage of the study is mainly concerned with the derivation and manipulation of performance results[OCRerr]for example, recall and precision ratios[OCRerr]and with the analysis of