IRE Information Retrieval Experiment Evaluation within the enviornment of an operating information service chapter F. Wilfrid Lancaster Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Some problems of evaluation applied to operating Systems 123 proof-oriented experiments or into insight-oriented experiments. By the former, we mean concentration of evaluator effort into large-scale systematic and statistically valid investigations of one type of variability with other conditions being relatively fixed; by the latter we mean a spreading of evaluator effort over a number of investigative forays designed to give not proof but insight as to how the experimental variables interact with one another. A proof-oriented experiment should lead to a well-defined statement of conclusion backed up with an analysis of variance of the results and identified confidence limits. However, such experiments are premature unless one knows exactly what one wants to prove and the conditions under which the proof is interesting. It is not of much interest to know that search option A is proven to be better than search option B under given conditions with confidence O.9999[OCRerr]what is really of interest is whether the given conditions are actually realistic, how much better is A than B, and is this better enough to be of real concern? Insight-oriented experiments may or may not lead to well-defined conclusions, and one such experiment may or may not be sufficiently meaningful statistically to constitute convincing proof in the face of withering doubt. However, several such tests can sometimes be performed for the cost of one proof-oriented test, and the pattern of observed results might tell a lot more about the system being investigated than any single test, no matter how firm the conclusions of that one test are.' (Page 43) Applying the results of an evaluation Once an evaluation has been conducted, the results must be analysed and interpreted with a view to making improvements in the service. In the application of the results of an evaluation, also, the real life situation may [OCRerr]I lifer substantially from the experimental one. In the latter, it should always be possible to make changes in any part of the system and to conduct further tests to assess the effect of such changes. In the real world, however, it may not be possible to make certain types of changes even though these changes have been shown to be highly desirable. To take one example, the results of an evaluation may strongly suggest that the personnel performing the indexing for a database should also be the personnel responsible for searching that database. The evaluation results have shown many examples of disagreement or lack of communication between the indexers and the searchers. But it may be practically impossible to integrate the two activities in a particular organization. In a government agency, for instance, it may be easier to get more money than it is to get authorization to increase personnel levels. The indexing is done outside the agency, under contract, and there is no possibility of increasing the staff so that both functions are performed by the same people. In the real world, also, there may be other types of constraints that are less likely to apply to the experimental situation. An operating information service must frequently make compromises. The theoretical ideal is not always attainable in practice. Some compromises may be necessitated by the fact that the centre operates as part of a larger enterprise[OCRerr]perhaps a network of some kind and it may be willing to compromise on vocabulary, record formats and other things in order to