IRE
Information Retrieval Experiment
Laboratory tests: automatic systems
chapter
Robert N. Oddy
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Conclusion 175
does understand his program. He may make statements such as `I did x and
observed effect y', in which x is his less-than-formal intention. What I wish
to emphasize here is the obligation of the researcher to set himself high
programming standards, so that he has a high degree of knowledge about his
programs.
If it is appropriate to do so, the experimenter should make use of existing,
tested soffware. His laboratory may collect a subroutine library containing
tried program modules to perform certain specific tasks. If he can incorporate
%.[OCRerr]()me of these into his own program, he will not only speed its development
and testing, but he will also, incidentally, assist in trying the modules as he
runs his experiment. An important reason for trusting the numerical accuracy
of the results which emerge from established information retrieval labora-
tories such as Smart8 and Cambridge University6 is that `standard' programs
and packages are used whenever possible. Frequent use over several years
should have revealed most faults.
All experimental sciences suffer to some extent from the inevitable
fallibility of apparatus. In addition to doing all it can to ensure that the
apparatus is working correctly, the research community must maintain a
moderate scepticism in its reception of results: they should be checked by
repetition. In information retrieval research, there is very little repetition of
tests for this purpose, perhaps because the computer-orientated research
community is so small. A useful repetition experiment would be expensive
and time-consuming. It is not adequate to simply obtain a copy of the
original programs, edit them if necessary, and run them again on another
machine. Any faults in the programs would also be copied (and it is in the
program that faults are most likely to be). The programs must be written
again from an independent specification. Exact agreement between the two
sets of results can rarely be expected. Differences in programming language
facilities, and hardware characteristics, such as computer word-size will
affect the respective programmers' interpretation of the specification in
minor ways, which may in turn affect results. Whether a difference in results
can be disregarded is a question that must be answered in the light of the
nature of the computations.
9.8 Conclusion
The interaction between computer-based laboratory experimentation and
information retrieval theory development has been very fruitful during the
last decade or so. The skills that the experimenters have learned enable them
to test (some) new ideas within days, or even hours. I am sure that the well-
developed situation in the laboratories has made a significant contribution to
the considerable theoretical progress in the field. Some researchers appear to
believe that their understanding of the information retrieval problem,
through mathematical theories, has attained a plateau[OCRerr]quite a lofty plateau!
In this work, there has been too little reference to the real world to justify this
optimism. There is no denying the elegance and the power of the current
probabilistic and term discrimination theories (to pick the most prominent
examples) to prescribe retrieval programs which are very effective under the
conditions obtaining in the laboratory. I have tried to point out reasons why