IRE
Information Retrieval Experiment
Opportunities for testing with online systems
chapter
Elizabeth D. Barraclough
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
7
Opportunities for testing with online
systems
Elizabeth D. Barraclough
7.1 Introduction
Information retrieval testing in the early years was concerned with finding
out what was theoretically possible when retrieving bibliographical records
from a database. The databases were small collections of records indexed
and searched manually. These early experiments produced a methodology of
testing, in particular the two performance measures of precision and recall.
The early computer based systems were initially concerned with demonstrat-
ing feasibility and then with trying to improve the performance of the system
measured in the same way as in the manual experiments. The use of the
computer allowed experiments with more complex searching techniques to
be tried but most of these were done on relatively small static databases
which had long since ceased to provide an information service to users.
Unfortunately, much of this work has been ignored by the system
providers. Few of the techniques demonstrated to improve precision or recall,
or to provide more efficient computer processing, have been incorporated in
any of the large, generally available, online systems. Instead the system
providers rely on the provision of extensive databases accessible to the user
to sell their system. From the commercial point of view they are very
successful. Users do tend to opt for the system with the most data available.
The performance of such systems, in terms of precision and recall, has largely
been ignored. Many of the users are unaware, or unconcerned, that they are
not achieving the best that the system can provide. The time is ripe for
experiments on current systems in order to demonstrate to the users the type
of service they are really getting. Naturally such experiments are more
difficult to perform than those in a static environment and, as we shall see,
there are many constraints which can bias the results.
Lancaster, in the previous chapter, has amply covered the methods of
evaluating systems including those in a real life environment. The function
of this chapter is to complement the evaluation techniques and try to show
how these can be brought closer to online systems. Most of the evaluation
tests that have been done consider the online system as an indivisible entity.
If systems are to be improved then tests must be carried out in much more
detail; one experiment by Rouse and Lannom' goes some way along this
route but not yet far enough.
128
I
I
I