IRE Information Retrieval Experiment Evaluation within the enviornment of an operating information service chapter F. Wilfrid Lancaster Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Evaluation criteria 113 effort might be required to identify the relevant items in a printed or typed list, especially if it contains only bibliographic citations and the user must himself retrieve copies of many of the documents before he can decide which are relevant and which are not. But this measure of effort is really only appropriate to the evaluation of a delegated search-one conducted on behalf of the requester by an information specialist. In this situation the system is viewed as more or less a `black box', into which a request is placed and out of which comes a group of documents or references to them. The precision ratio is a valid measure of the performance of any type of delegated search in which the information seeker submits a request to some `system' and waits for the results, whether the search is manual or fully mechanized. The precision ratio is not especially meaningful when applied to the non- delegated search. Here, the user conducts his own search and make relevance decisions continuously as he proceeds; that is, when he consults an index term in a printed index or an online system, he rejects irrelevant citations and records only those which seem relevant. A precision ratio could be derived for this type of search by counting the total number of citations the user consulted and the number he judged relevant, the precision ratio being the number of relevant citations found divided by the total number of citations consulted. This is a rather artificial measurement, however, because user effort in the non-delegated search situation can be expressed more directly in terms of the time required to conduct the search, and, for this, a unit cost in time, per relevant item found can be determined. Presumably, the higher the precision of a non-delegated search (proportion of relevant items examined to the total items examined), the less time it takes, all other things being equal. Leaving aside direct costs, four performance criteria by which any type of literature search, manual or mechanized, may be evaluated from the viewpoint of user satisfaction have been discussed thus far: recall, precision, response time, and user effort. The salient points of these performance measures are as follows: Recall. Important to all users of information services who are seeking bibliographic materials on a particular subject. In some cases, only a minimum level of recall is required[OCRerr]for example, one book or a few articles on a particular subject[OCRerr]and this is likely to be the most typical situation. In other cases, maximum recall is sought[OCRerr]for example, the user who wants a comprehensive search conducted in Chemical Abstracts. Precision. A meaningful measure of the performance of a delegated search conducted in any form of system, manual or mechanized. It is an indirect measure of user time and effort and not particularly appropriate in the evaluation of non-delegated searches, including non-delegated searches in online retrieval systems. User Effort. In a non-delegated search, effort is measured by the amount of time the user spends conducting the search. In a delegated search, it is measured by the amount of time the user spends negotiating his inquiry with the system and the amount of time he needs, when the search results are delivered to him, to separate the relevant from the irrelevant items, which is directly related to the precision ratio.