IRE
Information Retrieval Experiment
Evaluation within the enviornment of an operating information service
chapter
F. Wilfrid Lancaster
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Evaluation criteria ill
(1) A particular document whose identity is known.
(2) Specific factual information of the type that might come from some type
of reference book or from a machine-readable data bank[OCRerr]for example,
thermophysical property data on a particular substance.
(3) A few `good' articles, or references to them, on a specific topic.
(4) A comprehensive literature search in a particular subject area.
(5) A current alerting service by which the user is kept informed of new
literature relevant to his current professional interests.
These different needs have different response time requirements associated
with them. The requirement relating to the current alerting service is that it
should deliver regularly and frequently and that the information supplied
should be as up-to[OCRerr]date as possible. The user needing a comprehensive
literature search is usually engaged in a relatively long-term research project.
Speed of response may not be critical to him, except that there may be some
date beyond which the search results will have no value or, at least, greatly
reduced value; he is willing to wait longer in order to achieve completeness;
that is, completeness is more important to him than speed. For the other
types of information needs, on the other hand, the user generally wants fairly
rapid response.
The cost and time criteria relevant to the evaluation of information
services seem fairly obvious and are relatively constant from one activity to
another. But the quality criteria are perhaps less obvious and vary
considerably with the particular service being evaluated. They may also vary
with the kind of need that a particular user has in relation to a service.
There seem to be two major qualitative measures of success as applied to
information services:
(1) Does the user get what he is seeking or not?
(2) How completely or accurately does he get it?
The first of these measures, which applies, for example, to the search for
a particular item or the answer to a particular factual question, is simple and
unequivocal. The second, however, is much more difficult to apply in practice
because it implies both a human value judgement and the use of some
graduated scale to reflect degree of success. The second type of measure is
necessary, however, in the evaluation of most types of information retrieval
activity. `Recall' and `precision' are two criteria frequently used to judge the
performance of a search in an information retrieval system. Because these
measures are well known and well accepted in the evaluation of operating
information services, they will not be defined here.
The precision ratio and the recall ratio, used jointly, express the filtering
capacity of the system[OCRerr]its ability to let through what is wanted and to hold
back what is not. Neither one on its own gives a complete picture of the
effectiveness of a search. It is always possible to get 100 per cent recall if we
retrieve enough of the total collection; if we retrieve the entire collection, we
certainly achieve 100 per cent recall. Unfortunately, however, precision
would be extremely low in this situation because, for any typical request, the
great majority of the items in the collection are not relevant.
The precision ratio may be viewed as a type of cost factor in user time the
time required to separate the relevant citations from the irrelevant ones in