IRE Information Retrieval Experiment Evaluation within the enviornment of an operating information service chapter F. Wilfrid Lancaster Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 112 Evaluation within the environment of an operating information service the output of a search. Consider, as an illustration, a search request for which there are 20 relevant documents in a particular database. Suppose that three different search strategies are used to interrogate the system and that each retrieves 15 of the 20 relevant items; that is, recall is 75 per cent. In the first search, the total number of items retrieved is 30, in the second it is 60, and in the third it is 150. The precision ratio in these three searches is 50, 25, and 10 per cent, respectively. In the first search the user has to examine only 30 citations to find the 15 of relevance; in the second, 60; and in the third, 150. All other things being equal, it takes him longer to separate the relevant from the irrelevant in the second search than in the first, and considerably longer in the third. It is in this sense that we can regard the precision ratio as a measure of user effort or cost. A search that achieves 75 per cent recall at 25 per cent precision is more efficient than one that achieves 75 per cent recall at 10 per cent precision. Not everyone needs high recall all the tiine. Different users have different requirements for recall and precision, and a particular individual has different requirements at different times. The precisidn tolerance of the user is likely to be directly related to his recall requirements. At one end of the spectrum we have the individual who is writing a book, preparing a review article, or beginning a long-term research project. He is likely to want a comprehensive (high recall) search, and he may tolerate fairly low precision in order to assure himself that he has not missed anything of importance. At the other end, we have the typical user of, say, an industrial information service who needs a few recent articles on a subject and needs them right away. He does not need high recall but he expects high precision in the search results. Other individuals may prefer a compromise; they would like a `reasonable' level of recall at an `acceptable' level of precision. It seems rather pointless to use the recall ratio as a measure of the success of a search in which high recall is unimportant. This has led some writers to suggest the use of some measure of proportional recall, or relative recall, in which the success of the search is expressed in terms of the number of relevant documents retrieved over the number of relevant documents wanted by the requester. For example, the requester specifies that he needs five relevant documents, but the search retrieves only three. The proportional recall ratio is, therefore, 3/5, or 60 per cent. This measure, although attractive on the surface, is rather artificial in that very few requesters are able to specify in advance just how many documents they want from the system. Another limitation of the recall ratio is that it more or less assumes that all relevant documents have approximately equal value. This is not always true. A search may retrieve 5 relevant documents and miss 10 (recall ratio = 33 per cent), but the 5 retrieved may be much better than the 10 missed. They could, for example, be more up-to-date and might in fact make the other 10 items completely redundant. The recall ratio, although important, must thereforc be used with some caution in the evaluation of information services. The precision ratio also has its limitations. As we have already seen, it is actually an indirect measure of user time and effort spent at the output stagc of the information retrieval process; that is, the higher the precision ratio, the less effort the user needs to expend in separating relevant items from those which are not. In a search of very low precision ratio in which, say, only 10 items among 80 retrieved are judged relevant, considerable user time and A