IRE
Information Retrieval Experiment
Evaluation within the enviornment of an operating information service
chapter
F. Wilfrid Lancaster
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Evaluation criteria 113
effort might be required to identify the relevant items in a printed or typed
list, especially if it contains only bibliographic citations and the user must
himself retrieve copies of many of the documents before he can decide which
are relevant and which are not. But this measure of effort is really only
appropriate to the evaluation of a delegated search-one conducted on behalf
of the requester by an information specialist. In this situation the system is
viewed as more or less a `black box', into which a request is placed and out
of which comes a group of documents or references to them. The precision
ratio is a valid measure of the performance of any type of delegated search in
which the information seeker submits a request to some `system' and waits
for the results, whether the search is manual or fully mechanized.
The precision ratio is not especially meaningful when applied to the non-
delegated search. Here, the user conducts his own search and make relevance
decisions continuously as he proceeds; that is, when he consults an index
term in a printed index or an online system, he rejects irrelevant citations and
records only those which seem relevant. A precision ratio could be derived
for this type of search by counting the total number of citations the user
consulted and the number he judged relevant, the precision ratio being the
number of relevant citations found divided by the total number of citations
consulted. This is a rather artificial measurement, however, because user
effort in the non-delegated search situation can be expressed more directly in
terms of the time required to conduct the search, and, for this, a unit cost in
time, per relevant item found can be determined. Presumably, the higher the
precision of a non-delegated search (proportion of relevant items examined
to the total items examined), the less time it takes, all other things being
equal.
Leaving aside direct costs, four performance criteria by which any type of
literature search, manual or mechanized, may be evaluated from the
viewpoint of user satisfaction have been discussed thus far: recall, precision,
response time, and user effort. The salient points of these performance
measures are as follows:
Recall. Important to all users of information services who are seeking
bibliographic materials on a particular subject. In some cases, only a
minimum level of recall is required[OCRerr]for example, one book or a few
articles on a particular subject[OCRerr]and this is likely to be the most typical
situation. In other cases, maximum recall is sought[OCRerr]for example, the user
who wants a comprehensive search conducted in Chemical Abstracts.
Precision. A meaningful measure of the performance of a delegated search
conducted in any form of system, manual or mechanized. It is an indirect
measure of user time and effort and not particularly appropriate in the
evaluation of non-delegated searches, including non-delegated searches in
online retrieval systems.
User Effort. In a non-delegated search, effort is measured by the amount of
time the user spends conducting the search. In a delegated search, it is
measured by the amount of time the user spends negotiating his inquiry
with the system and the amount of time he needs, when the search results
are delivered to him, to separate the relevant from the irrelevant items,
which is directly related to the precision ratio.