IRE
Information Retrieval Experiment
Evaluation within the enviornment of an operating information service
chapter
F. Wilfrid Lancaster
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
114 Evaluation within the environment of an operating information service
Response Time. In a delegated search, this represents the time elapsing
between the submission of a request by the user and his receipt of the
search results. In a non-delegated situation, it represents the time involved
in the actual conduct of the search; in this case, it is also a measure of user
effort.
Table 6.1 lists some further performance criteria that may be applied in the
evaluation of information retrieval systems, including `coverage' and
`novelty'. Coverage may be considered an extension of recall; it is expressed
in terms of how much coverage of the literature on a specific subject is
provided by a particular database. Suppose, for example, that a scientist
wishes to find all possible references to the use of lasers in eye surgery. An
obvious source would be the printed Index Medicus or, even better, the
computer-based MEDLINE service operated by the National Library of
Medicine (NLM). Suppose also that the search in the NLM database
retrieves everything of relevance, that is, achieves 100 per cent recall-a
rather unlikely situation. Even if the search is complete, so far as the database
is concerned, the user who needs a really comprehensive search also wants to
know the exact coverage of the database, that is, what proportion of all the
literature on eye surgery using lasers is contained in the database. Searching
a particular database may result in 100 per cent recall but may give a low
overall coverage of the literature. Absolute coverage of the collection is only
of direct concern to the person who needs a comprehensive search. It is
probable that the user whose need is satisfied by finding, on the library
shelves, one or two books on a subject of interest is quite unconcerned as to
how complete the library's collection may be in this subject area. At a later
time, however, he may require a comprehensive search on this or some other
topic, and the coverage of the collection consulted would then be important
to him. Coverage, like recall and precision, can be expressed as a percentage.
If, for example, the results of a search conducted in Chemical Abstracts were
being evaluated, it could be estimated, not very easily, that the recall ratio is
75 per cent; it could also be estimated, even less easily, that the coverage of
Chemical Abstracts on the subject area of the search is 40 per cent. With an
estimated coverage of 40 per cent and recall of 75 per cent the overall
estimate of the comprehensiveness of the search is 30 per cent.
Another performance measure that may have some value is the novelty
ratio, the proportion of relevant items retrieved in a search that are new to
the requester, that is, brought to his attention for the first time by the search.
The novelty ratio is particularly appropriate in the evaluation of literature
searches conducted for current awareness purposes, that is, SDI, since,
presumably, a good current awareness service brings documents to the
attention of users before they learn of them by other means.
When cost criteria are related to quality criteria, cost-effectiveness criteria
are derived. Some possible cost-effectiveness criteria applicable to informa-
tion services include the unit cost per relevant item (document or document
reference) retrieved and the unit cost per new relevant item retrieved. Cost
can be measured directly in monetary units or in time and effort expended.
There is still one further evaluation criterion listed in Table 6.1, namely,
accuracy of data. This criterion substitutes for recall and precision in the
evaluation of information services designed to answer questions that have
A