IRE
Information Retrieval Experiment
Evaluation within the enviornment of an operating information service
chapter
F. Wilfrid Lancaster
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
112 Evaluation within the environment of an operating information service
the output of a search. Consider, as an illustration, a search request for which
there are 20 relevant documents in a particular database. Suppose that three
different search strategies are used to interrogate the system and that each
retrieves 15 of the 20 relevant items; that is, recall is 75 per cent. In the first
search, the total number of items retrieved is 30, in the second it is 60, and in
the third it is 150. The precision ratio in these three searches is 50, 25, and 10
per cent, respectively. In the first search the user has to examine only 30
citations to find the 15 of relevance; in the second, 60; and in the third, 150.
All other things being equal, it takes him longer to separate the relevant from
the irrelevant in the second search than in the first, and considerably longer
in the third. It is in this sense that we can regard the precision ratio as a
measure of user effort or cost. A search that achieves 75 per cent recall at 25
per cent precision is more efficient than one that achieves 75 per cent recall
at 10 per cent precision.
Not everyone needs high recall all the tiine. Different users have different
requirements for recall and precision, and a particular individual has
different requirements at different times. The precisidn tolerance of the user
is likely to be directly related to his recall requirements. At one end of the
spectrum we have the individual who is writing a book, preparing a review
article, or beginning a long-term research project. He is likely to want a
comprehensive (high recall) search, and he may tolerate fairly low precision
in order to assure himself that he has not missed anything of importance. At
the other end, we have the typical user of, say, an industrial information
service who needs a few recent articles on a subject and needs them right
away. He does not need high recall but he expects high precision in the
search results. Other individuals may prefer a compromise; they would like
a `reasonable' level of recall at an `acceptable' level of precision.
It seems rather pointless to use the recall ratio as a measure of the success
of a search in which high recall is unimportant. This has led some writers to
suggest the use of some measure of proportional recall, or relative recall, in
which the success of the search is expressed in terms of the number of
relevant documents retrieved over the number of relevant documents wanted
by the requester. For example, the requester specifies that he needs five
relevant documents, but the search retrieves only three. The proportional
recall ratio is, therefore, 3/5, or 60 per cent. This measure, although attractive
on the surface, is rather artificial in that very few requesters are able to
specify in advance just how many documents they want from the system.
Another limitation of the recall ratio is that it more or less assumes that all
relevant documents have approximately equal value. This is not always true.
A search may retrieve 5 relevant documents and miss 10 (recall ratio = 33 per
cent), but the 5 retrieved may be much better than the 10 missed. They could,
for example, be more up-to-date and might in fact make the other 10 items
completely redundant. The recall ratio, although important, must thereforc
be used with some caution in the evaluation of information services.
The precision ratio also has its limitations. As we have already seen, it is
actually an indirect measure of user time and effort spent at the output stagc
of the information retrieval process; that is, the higher the precision ratio,
the less effort the user needs to expend in separating relevant items from
those which are not. In a search of very low precision ratio in which, say, only
10 items among 80 retrieved are judged relevant, considerable user time and
A