IRE
Information Retrieval Experiment
Laboratory tests: automatic systems
chapter
Robert N. Oddy
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
174 Laboratory tests: automatic systems
upon. In recent years a considerable amount of research has been done on the
problems of reliability in computer Systems in general. A very useful
collection of papers is to be found in Anderson and Randell69. Among the
techniques for improving program reliability, those for fault avoidance, as
opposed to tolerance, and those not requiring a special programming system
are of most interest to the experimental information retrieval programmer.
Disciplined use of a good programming methodology and a language which
allows for a reasonably natural expression of the process and data structures
are the most significant techniques.
Correctness of a program is judged with respect to a specification of the
required function and behaviour of the program. The judgement can be
made in two ways:
(1)
The program can be tested: Gerhart70 discusses the principles of testing.
A program must always be tested. The testing of an information retrieval
laboratory program is quite conventional, although constructing test data
can be tedious. However, testing can never be exhaustive. One may try
to test program modules exhaustively, but modules have a habit of
interacting with each other in unexpected ways when they are run
together, so ultimately one is faced with the prospect of checking every
conceivable output of the complete program One compensates for the
inevitably partial testing by constantly keeping an eye on the reasonable-
ness of all output produced by the program, and by combining testing
with the second method of judging correctness: reasoning about the
program.
(2) The program may be proved to be consistent with the specification. This
is extremely difficult and the proofs tend to be unwieldy, but very informal
proofs can often be done for parts of the program, if it is well structured,
which are convincing enough for most purposes.
With informa[OCRerr]on retneval programs, we must be clear what we mean b[OCRerr]
the specification. I am concerned at the moment with the technical problen'
of obtaining a correct program, and not the research problem of obtaining at
ideal program design. Thus correctness is to be judged against the researcher'
design, rather than against the system user's requirement. For this concep
of correctness to have a straightforward meaning, the semantics of th
researcher's system specification must be quite clear, that is it must b
possible to express it formally. There is no problem, in principle, if th
system is a consequence of a mathematical theory. If, however, the prograr
is the model, in the sense discussed in the previous section, then we ar
reduced to talking about validating the program against [OCRerr] The research[OCRerr]
will probably have a detailed description of the program (perhaps similar
appearance to the extract given above of the description of Thomas), but th
is not formal. The meaning of the description is worked out in the prograTi
It is therefore possible that the researcher will be experimenting with a mod
which differs from his original intention in some unknown way, and whi(
he does not fully understand. (He will have the program text, in some fort
but it does not follow that he understands the model.) Of course, mai
programs which have independent formal specifications for part of them al
incorporate heuristics, and that fact, strictly speaking, puts them into t
same category. The researcher would normally make the assumption that