IRE Information Retrieval Experiment Evaluation within the enviornment of an operating information service chapter F. Wilfrid Lancaster Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Some problems of evaluation applied to operating Systems 115 unequivocal factual answers. The answer to such a question as `What is the melting point of. .?` is either supplied completely and correctly or it is not. Question-answering services, whether the answer is supplied from a printed source or a machine-readable data bank, must therefore be evaluated in terms of the completeness and accuracy of the data supplied. 6.3 Some problems of evaluation applied to operating systems In a book on experimentation in information retrieval a chapter on the evaluation of operating systems may be considered something of an interloper. Evaluation does not necessarily imply experimentation. In fact, the evaluation of an operating system will not usually involve any experimentation. It will merely be an analysis of the performance of the system at a particular point in time. This does not mean that experimentation is impossible within an operating environment. It is possible, but it might be quite difficult. Although most are not framed in this way, it is possible for an evaluation of an operating system to take the shape of a formal research project with a hypothesis that the investigators set out to test. An example of such a hypothesis might be `Literature searches conducted for requestors who visit the centre in person produce better results in terms of recall and precision than those conducted for users who submit their requests by mail to the centre'. It is possible to build an evaluation upon a research hypothesis of this type. But, unlike the true experimental situation, in an operating environment it may be quite difficult to control all the independent variables that may affect the results. This is not to imply that controls should not be sought. Variables extraneous to the focus of the study should be controlled as much as possible. In the evaluation of an operating system, established principles of experimental design, sampling, survey methodology, and other methodological issues, are as relevant and important as they are in any other evaluation situation. It must be recognized, however, that, when dealing with a real-life environment, some methodological compromises may need to be made. While the evaluation may not be based on a formal research hypothesis, it should certainly have some clearly defined evaluation objectives. The major steps involved in the conduct of an evaluation programme are the following: (1) Defining the scope of the evaluation. (2) Designing the evaluation programme. (3) Execution of the evaluation. (4) Analysis and interpretation of the results. (5) Modifying the system or service on the basis of the evaluation results. Definition of scope The first step, the definition ofscope, entails the preparation of a precise set of questions that the evaluation must be designed to answer. The purpose of an evaluation is to learn more about the capabilities and weaknesses of a system