IRE Information Retrieval Experiment Evaluation within the enviornment of an operating information service chapter F. Wilfrid Lancaster Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 114 Evaluation within the environment of an operating information service Response Time. In a delegated search, this represents the time elapsing between the submission of a request by the user and his receipt of the search results. In a non-delegated situation, it represents the time involved in the actual conduct of the search; in this case, it is also a measure of user effort. Table 6.1 lists some further performance criteria that may be applied in the evaluation of information retrieval systems, including `coverage' and `novelty'. Coverage may be considered an extension of recall; it is expressed in terms of how much coverage of the literature on a specific subject is provided by a particular database. Suppose, for example, that a scientist wishes to find all possible references to the use of lasers in eye surgery. An obvious source would be the printed Index Medicus or, even better, the computer-based MEDLINE service operated by the National Library of Medicine (NLM). Suppose also that the search in the NLM database retrieves everything of relevance, that is, achieves 100 per cent recall-a rather unlikely situation. Even if the search is complete, so far as the database is concerned, the user who needs a really comprehensive search also wants to know the exact coverage of the database, that is, what proportion of all the literature on eye surgery using lasers is contained in the database. Searching a particular database may result in 100 per cent recall but may give a low overall coverage of the literature. Absolute coverage of the collection is only of direct concern to the person who needs a comprehensive search. It is probable that the user whose need is satisfied by finding, on the library shelves, one or two books on a subject of interest is quite unconcerned as to how complete the library's collection may be in this subject area. At a later time, however, he may require a comprehensive search on this or some other topic, and the coverage of the collection consulted would then be important to him. Coverage, like recall and precision, can be expressed as a percentage. If, for example, the results of a search conducted in Chemical Abstracts were being evaluated, it could be estimated, not very easily, that the recall ratio is 75 per cent; it could also be estimated, even less easily, that the coverage of Chemical Abstracts on the subject area of the search is 40 per cent. With an estimated coverage of 40 per cent and recall of 75 per cent the overall estimate of the comprehensiveness of the search is 30 per cent. Another performance measure that may have some value is the novelty ratio, the proportion of relevant items retrieved in a search that are new to the requester, that is, brought to his attention for the first time by the search. The novelty ratio is particularly appropriate in the evaluation of literature searches conducted for current awareness purposes, that is, SDI, since, presumably, a good current awareness service brings documents to the attention of users before they learn of them by other means. When cost criteria are related to quality criteria, cost-effectiveness criteria are derived. Some possible cost-effectiveness criteria applicable to informa- tion services include the unit cost per relevant item (document or document reference) retrieved and the unit cost per new relevant item retrieved. Cost can be measured directly in monetary units or in time and effort expended. There is still one further evaluation criterion listed in Table 6.1, namely, accuracy of data. This criterion substitutes for recall and precision in the evaluation of information services designed to answer questions that have A