IRE Information Retrieval Experiment Laboratory tests: automatic systems chapter Robert N. Oddy Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Conclusion 175 does understand his program. He may make statements such as `I did x and observed effect y', in which x is his less-than-formal intention. What I wish to emphasize here is the obligation of the researcher to set himself high programming standards, so that he has a high degree of knowledge about his programs. If it is appropriate to do so, the experimenter should make use of existing, tested soffware. His laboratory may collect a subroutine library containing tried program modules to perform certain specific tasks. If he can incorporate %.[OCRerr]()me of these into his own program, he will not only speed its development and testing, but he will also, incidentally, assist in trying the modules as he runs his experiment. An important reason for trusting the numerical accuracy of the results which emerge from established information retrieval labora- tories such as Smart8 and Cambridge University6 is that `standard' programs and packages are used whenever possible. Frequent use over several years should have revealed most faults. All experimental sciences suffer to some extent from the inevitable fallibility of apparatus. In addition to doing all it can to ensure that the apparatus is working correctly, the research community must maintain a moderate scepticism in its reception of results: they should be checked by repetition. In information retrieval research, there is very little repetition of tests for this purpose, perhaps because the computer-orientated research community is so small. A useful repetition experiment would be expensive and time-consuming. It is not adequate to simply obtain a copy of the original programs, edit them if necessary, and run them again on another machine. Any faults in the programs would also be copied (and it is in the program that faults are most likely to be). The programs must be written again from an independent specification. Exact agreement between the two sets of results can rarely be expected. Differences in programming language facilities, and hardware characteristics, such as computer word-size will affect the respective programmers' interpretation of the specification in minor ways, which may in turn affect results. Whether a difference in results can be disregarded is a question that must be answered in the light of the nature of the computations. 9.8 Conclusion The interaction between computer-based laboratory experimentation and information retrieval theory development has been very fruitful during the last decade or so. The skills that the experimenters have learned enable them to test (some) new ideas within days, or even hours. I am sure that the well- developed situation in the laboratories has made a significant contribution to the considerable theoretical progress in the field. Some researchers appear to believe that their understanding of the information retrieval problem, through mathematical theories, has attained a plateau[OCRerr]quite a lofty plateau! In this work, there has been too little reference to the real world to justify this optimism. There is no denying the elegance and the power of the current probabilistic and term discrimination theories (to pick the most prominent examples) to prescribe retrieval programs which are very effective under the conditions obtaining in the laboratory. I have tried to point out reasons why