IRE Information Retrieval Experiment Simulation, and simulation experiments chapter Michael D. Heine Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 184 Simulation, and simulation experiments variation of relevance judgements (for an algorithmically-defined query say, and a set of information needs) on Recall at the Precision value 0.60, this might properly be termed an `experiment' since the relevance-judgements, one's object of study, are not under the control of the experimenter. But if, for a given exercise of relevance-judgement by a person (applied to all items in a test-collection, say) we examined the relative effects on Recall (at P 0.60) of varying the document weighting expression, this might better be termed a `simulation experiment'. In the latter instance, all components of the system are (a) known, and (b) under control. The outcome of the investigation is determined by logic: it is `pre-determined'. (We would probably use a computer to obtain the output data though that is more incidental, and that alone would not justify a process being termed a simulation.) In this introduction, the reasons or motivations for undertaking simulation studies (in the narrower senses) of information retrieval, in preference to experimental studies, have not been discussed in detail. This is because they are fully discussed in general terms in the standard texts on simulation (e.g. Churchman8, Martin9, Gordon10), and because the reasons tend to carry little conviction until one has actually undertaken a simulation study-at least in the author's experience. But briefly it might be claimed that (a) the simulation study itself requires that a formal representation of the system (called a `model' by some writers) be arrived at, this itself giving valuable insight into the system, (b) manipulation of data and paths within a conceptual framework of a system (i.e. within a formal structure or model) is more economical of effort, money and time than manipulation of the real system so represented and measurement of data pertaining to it (i.e. than experimentation on the system)-which of course begs the question of the validity of the formal structure, and (c) the simulation suggests new areas for observational (experimental and investigative) work predicated on the validity of the constructions that it comprises. In summary, simulation in its broadest sense is of interest to information retrieval workers because of the very uncertainty of its definition; because it provokes our interest in the conceptual roots of representation and transfer of information (or should we say, of representation and the transfer ol' representation[OCRerr]the main topics of what we call `information science'. This is of concern in information retrieval experiments both because such experiments appear to serve as a prototype for experiments on information transfer construed more generally, and because more formal (theoretical) study of information transfer may give insights into the process of information retrieval as we usually regard it. Our experiments and investigations or document transfer at the macroscopic `human' level are into instances Of information transfer that are perhaps artificially circumscribed. On the other hand, simulation in its narrowest sense, that of the representation of randomness in relationships between people, documents, document attri- butes, and logical expressions representing `information needs', is of interest and value in challenging the terms and concepts we use, in distinguishin[OCRerr] between tautological findings (i.e. findings simply a consequence of the language of description) and findings that are not tautological (`scientific' findings), and in suggesting new experiments consistent with conjectures within the language of description. But truths suggested by a simulation experiment are always suspect-in that the structure of the simulation is one I