IRE
Information Retrieval Experiment
Evaluation within the enviornment of an operating information service
chapter
F. Wilfrid Lancaster
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
116 Evaluation within the environment of an operating information service
or service, and the definition of scope is really a statement of what precisely
is to be learned through the study. The definition of scope must be prepared
by the person requesting the evaluation, who is usually one of the managers
of the system or one of those responsible for funding it. It is the responsibility
of the evaluator to design a study capable of answering all the questions
posed in the definition of scope. A sample work statement for an evaluation
programme is given in Table 6.2. This statement is a list of the questions to
be answered in the MEDLARS study, as reported by Lancaster1 [OCRerr]. It is a
rather long list because the study was a comprehensive evaluation of a very
large system. Evaluation studies of more modest scope would involve fewer
questions. In fact, it is quite conceivable that an evaluation might be designed
to answer only one or two important questions.
TABLE 6.2. Example of a `Work Statement' for an evaluation of an operating information service
Overall performance
(1) What is the overall performance level of the system in relation to user requirements? Are
there significant differences for various types of request and in various broad subject areas?
Coverage and processing
(1) How sound are present policies regarding indexing coverage?
(2) Is the delay between receipt of a journal and its processing in the indexing section
significantly affecting performance?
Indexing
(1) Are there significant variations in inter-indexer performance?
(2) How far is this related to experience in indexing and to degree of `revising'?
(3) Do the indexers recognize the specific concepts that are of interest to various user groups?
(4) What is the effect of present policies relating to exhaustivity of indexing?
Index language
(1) Are the terms sufficiently specific?
(2) Are variations in specificity of terms in different areas significantly affecting performance?
(3) Is the need for additional precision devices, such as weighting, role indicators, or a form of
interlocking, indicated?
(4) Is the quality of term association in the thesaurus adequate?
(5) Is the present entry vocabulary adequate?
Searching
(1) What are the requirements of the users regarding recall and precision?
(2) Can search strategies be devised to meet requirements for high recall or high precision?
(3) How effectively can searchers screen output? What effect does screening have on recall and
precision figures?
(4) What are the most promising modes of user/system interaction?
a. Having more liaison at the request stage.
b. Having more liaison at the search formulation stage.
c. An iterative search procedure that presents the user with a sample of citations retrieved
by a `first approximation' search, and allows him to reformulate his request in the light
of these retrieved items.
(5) What is the effect on response time of these various modes of interaction?
Input and computer processing
(1) Do input procedures, including various aspects of clerical processing, result in a significant
number of errors?
(2) Are computer programs flexible enough to obtain desired performance levels? Do they
achieve the required checks on clerical error?
(3) What part of the overall response lag can be attributed to the data processing subsystem?
What are the causes of delays in this subsystem?
I