IRE
Information Retrieval Experiment
Evaluation within the enviornment of an operating information service
chapter
F. Wilfrid Lancaster
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Some problems of evaluation applied to operating Systems 123
proof-oriented experiments or into insight-oriented experiments. By the
former, we mean concentration of evaluator effort into large-scale
systematic and statistically valid investigations of one type of variability
with other conditions being relatively fixed; by the latter we mean a
spreading of evaluator effort over a number of investigative forays designed
to give not proof but insight as to how the experimental variables interact
with one another.
A proof-oriented experiment should lead to a well-defined statement of
conclusion backed up with an analysis of variance of the results and
identified confidence limits. However, such experiments are premature
unless one knows exactly what one wants to prove and the conditions
under which the proof is interesting. It is not of much interest to know that
search option A is proven to be better than search option B under given
conditions with confidence O.9999[OCRerr]what is really of interest is whether
the given conditions are actually realistic, how much better is A than B, and
is this better enough to be of real concern?
Insight-oriented experiments may or may not lead to well-defined
conclusions, and one such experiment may or may not be sufficiently
meaningful statistically to constitute convincing proof in the face of
withering doubt. However, several such tests can sometimes be performed
for the cost of one proof-oriented test, and the pattern of observed results
might tell a lot more about the system being investigated than any single
test, no matter how firm the conclusions of that one test are.' (Page 43)
Applying the results of an evaluation
Once an evaluation has been conducted, the results must be analysed and
interpreted with a view to making improvements in the service. In the
application of the results of an evaluation, also, the real life situation may
[OCRerr]I lifer substantially from the experimental one. In the latter, it should always
be possible to make changes in any part of the system and to conduct further
tests to assess the effect of such changes. In the real world, however, it may
not be possible to make certain types of changes even though these changes
have been shown to be highly desirable. To take one example, the results of
an evaluation may strongly suggest that the personnel performing the
indexing for a database should also be the personnel responsible for searching
that database. The evaluation results have shown many examples of
disagreement or lack of communication between the indexers and the
searchers. But it may be practically impossible to integrate the two activities
in a particular organization. In a government agency, for instance, it may be
easier to get more money than it is to get authorization to increase personnel
levels. The indexing is done outside the agency, under contract, and there is
no possibility of increasing the staff so that both functions are performed by
the same people. In the real world, also, there may be other types of
constraints that are less likely to apply to the experimental situation. An
operating information service must frequently make compromises. The
theoretical ideal is not always attainable in practice. Some compromises may
be necessitated by the fact that the centre operates as part of a larger
enterprise[OCRerr]perhaps a network of some kind and it may be willing to
compromise on vocabulary, record formats and other things in order to