IRE Information Retrieval Experiment Simulation, and simulation experiments chapter Michael D. Heine Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Introduction 183 esimulation experiment'. All are processes that we recognize to be mixed e[OCRerr][OCRerr]gnitive/behavioural[OCRerr] They all help us `get on better' with the World in which we find ourselves by acquiring information for us, i.e. they alter the mimetic structures that govern our individual and social responses to the constraints and opportunities offered by other systems. However, investiga- lion and experiment (treating these as similar though distinct processes) have Iwo features that simulation does not: an experimental apparatus is needed (OV them to be implemented (even though, for investigation of document/user interaction, say, the apparatus may just be a file of records), and no suppositions are made as to how the information acquired is generated (one can for example, experimentally measure the acceleration `due to gravity' without knowing how the system of primitive entities determining the acceleration are affecting the apparatus and so determining the data). Simulation, on the other hand, does not require an apparatus, and does concern itself with how information (data) is generated by a system. Suppositions are made about the entities making up the system (though the entities are regarded as `constructions' rather than `descriptions', so that suppositions is not quite the right term), about the relationships between entities, and about the effects of system input upon entities and (possibly) relationships. This definitional structure is then used to predict the output or outputs of the system, which can be described as information or data. So that unlike experiment and investigation, the determining entities in simulation work are not treated as primitive ones but as objects of study. That is essentially the strength and weakness of simulation work-as it is of all `theoretical' study: the objects of interest and experimental study are made explicit but they remain constructs. This may seem a trivial or fine point, since in practice when simulation is applied at the human level (queues for tickets, say) or in an area of technology where `laws' are well established, all entities of relevance seem clearly evident, and some of them are under our control (e.g. the number of serving booths) or can at least be influenced by us. But in view of the unclear foundations of information science, it seems essential to emphasize that the outcomes of simulation work, since they are based on a human construct, can never surprise' us (and so inform us) as much as experimental results can. We can be a bit surprised by the results of a simulation study (e.g. in regard to a pattern of symmetry or an instance of invariance that we missed in an experiment) but never very surprised, since the simulation explores a structure that we ourselves created: the results are, in that sense, tautological or necessary. Just as mathematics as an edifice of thought is inviolate and `safe' (that's what it's there for), so is a simulation study. Both lack the open (i.e. receptive to amendment) syntax of science, a syntax which encourages information feedback that modifies and invigorates its structure when that information is obtained from experimentation. Returning to terminology now, we define a simulation study as a `simulation experiment' when the system's components (e.g. the parameters of probability distributions) are given certain values, or are explored in a certain order, and the consequences of same are noted. It is, in regard to Cranfield-type experimentation in information retrieval, a moot point whether some of such work should be described as `experiment' or `simulation', simply because it is not natural phenomena but man-created phenomena that are being explored. If an information retrieval experiment examines the effect of