IRE
Information Retrieval Experiment
Laboratory tests: automatic systems
chapter
Robert N. Oddy
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Laboratory experiment in information retrieval 159
treatment of concepts relevant to information retrieval shifts from one region
of the diagram to another. Ideas that previously made their first well-specified
appearance in the programs now have a place in the mathematical structure
of the prescriptive theory from which the programs are derived. An example
of this process is evident in Croft's work5. Croft proposes a theoretical
underpinning of known3 techniques for searching clusters of document
descriptions. Such matters as matching functions and cluster representatives
are dealt with in the theory rather than being specified in an ad hoc manner
in the program. (There are, of course, analogous phenomena in the
development of other prescriptive theories: men and women flew with a
certain degree of success at a time when they had dangerously little knowledge
of aerodynamics.) Theories of information retrieval based on cognitive
considerations are in their infancy at the moment, and we find that a number
of relatively imprecise assumptions are made; the theory consists largely of
non-mathematical arguments leading to a prescription of a general nature
only; and a program design (if such exists) in which, consequently, ad hoc
decisions abound. An example of work at this early stage of development is
the attempt of Belkin and Oddy19 to design an information retrieval system
based on a notion of anomalies in the state of knowledge of an enquirer.
Perhaps some of the formal structures which evolve in the programming will
ultimately find their way into a more formal theory.
It is not my intention in this chapter to survey computer-based laboratory
testing of information retrieval techniques. My concern is rather with the
role of this type of laboratory work in information retrieval research, present
and future, and with its limitations and difficulties. The potential benefits of
the methodology can be summarized as follows:
(1) Control. The whole test is performed by a machine and is thus, in
principle, entirely manageable. The computer is a perfect laboratory
assistant. All experiments are exactly repeatable, and observation
(monitoring) is carried out with accuracy and consistency. Components
of the system can be isolated and modified or replaced, without affecting
the rest of the system. The components can therefore be individually
evaluated.
(2) Speed. Needless to say, the execution of searches can be very rapid on a
computer, and evaluation measurements can easily be collected and
processed, automatically and immediately. In addition, amendments or
corrections to a search strategy can often be made by editing small
sections of the program, within a matter of a few days.
(3) Power ofexpression. Programming gives us an additional formal mode of
expression for models and theories, and therefore has a useful part to play
in theory development.
(4) Prototype development. One can rarely copy program code from a
laboratory environment directly into a real life system. However, the
laboratory program can be a useful aid to operational system specification.
Some of the limitations of the methodology, and the difficulties associated
with it, are:
(1) Restricted view. Human factors in operational information retrieval
systems are usually not taken into account in tests of the type which
we are presently considering. Some factors, such as command language
and displav contents and format could conceivablv overshadow the