IRE Information Retrieval Experiment Evaluation within the enviornment of an operating information service chapter F. Wilfrid Lancaster Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 116 Evaluation within the environment of an operating information service or service, and the definition of scope is really a statement of what precisely is to be learned through the study. The definition of scope must be prepared by the person requesting the evaluation, who is usually one of the managers of the system or one of those responsible for funding it. It is the responsibility of the evaluator to design a study capable of answering all the questions posed in the definition of scope. A sample work statement for an evaluation programme is given in Table 6.2. This statement is a list of the questions to be answered in the MEDLARS study, as reported by Lancaster1 [OCRerr]. It is a rather long list because the study was a comprehensive evaluation of a very large system. Evaluation studies of more modest scope would involve fewer questions. In fact, it is quite conceivable that an evaluation might be designed to answer only one or two important questions. TABLE 6.2. Example of a `Work Statement' for an evaluation of an operating information service Overall performance (1) What is the overall performance level of the system in relation to user requirements? Are there significant differences for various types of request and in various broad subject areas? Coverage and processing (1) How sound are present policies regarding indexing coverage? (2) Is the delay between receipt of a journal and its processing in the indexing section significantly affecting performance? Indexing (1) Are there significant variations in inter-indexer performance? (2) How far is this related to experience in indexing and to degree of `revising'? (3) Do the indexers recognize the specific concepts that are of interest to various user groups? (4) What is the effect of present policies relating to exhaustivity of indexing? Index language (1) Are the terms sufficiently specific? (2) Are variations in specificity of terms in different areas significantly affecting performance? (3) Is the need for additional precision devices, such as weighting, role indicators, or a form of interlocking, indicated? (4) Is the quality of term association in the thesaurus adequate? (5) Is the present entry vocabulary adequate? Searching (1) What are the requirements of the users regarding recall and precision? (2) Can search strategies be devised to meet requirements for high recall or high precision? (3) How effectively can searchers screen output? What effect does screening have on recall and precision figures? (4) What are the most promising modes of user/system interaction? a. Having more liaison at the request stage. b. Having more liaison at the search formulation stage. c. An iterative search procedure that presents the user with a sample of citations retrieved by a `first approximation' search, and allows him to reformulate his request in the light of these retrieved items. (5) What is the effect on response time of these various modes of interaction? Input and computer processing (1) Do input procedures, including various aspects of clerical processing, result in a significant number of errors? (2) Are computer programs flexible enough to obtain desired performance levels? Do they achieve the required checks on clerical error? (3) What part of the overall response lag can be attributed to the data processing subsystem? What are the causes of delays in this subsystem? I