IRE Information Retrieval Experiment Opportunities for testing with online systems chapter Elizabeth D. Barraclough Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 7 Opportunities for testing with online systems Elizabeth D. Barraclough 7.1 Introduction Information retrieval testing in the early years was concerned with finding out what was theoretically possible when retrieving bibliographical records from a database. The databases were small collections of records indexed and searched manually. These early experiments produced a methodology of testing, in particular the two performance measures of precision and recall. The early computer based systems were initially concerned with demonstrat- ing feasibility and then with trying to improve the performance of the system measured in the same way as in the manual experiments. The use of the computer allowed experiments with more complex searching techniques to be tried but most of these were done on relatively small static databases which had long since ceased to provide an information service to users. Unfortunately, much of this work has been ignored by the system providers. Few of the techniques demonstrated to improve precision or recall, or to provide more efficient computer processing, have been incorporated in any of the large, generally available, online systems. Instead the system providers rely on the provision of extensive databases accessible to the user to sell their system. From the commercial point of view they are very successful. Users do tend to opt for the system with the most data available. The performance of such systems, in terms of precision and recall, has largely been ignored. Many of the users are unaware, or unconcerned, that they are not achieving the best that the system can provide. The time is ripe for experiments on current systems in order to demonstrate to the users the type of service they are really getting. Naturally such experiments are more difficult to perform than those in a static environment and, as we shall see, there are many constraints which can bias the results. Lancaster, in the previous chapter, has amply covered the methods of evaluating systems including those in a real life environment. The function of this chapter is to complement the evaluation techniques and try to show how these can be brought closer to online systems. Most of the evaluation tests that have been done consider the online system as an indivisible entity. If systems are to be improved then tests must be carried out in much more detail; one experiment by Rouse and Lannom' goes some way along this route but not yet far enough. 128 I I I