IRE
Information Retrieval Experiment
Simulation, and simulation experiments
chapter
Michael D. Heine
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
184 Simulation, and simulation experiments
variation of relevance judgements (for an algorithmically-defined query say,
and a set of information needs) on Recall at the Precision value 0.60, this
might properly be termed an `experiment' since the relevance-judgements,
one's object of study, are not under the control of the experimenter. But if, for
a given exercise of relevance-judgement by a person (applied to all items in
a test-collection, say) we examined the relative effects on Recall (at P 0.60)
of varying the document weighting expression, this might better be termed
a `simulation experiment'. In the latter instance, all components of the system
are (a) known, and (b) under control. The outcome of the investigation is
determined by logic: it is `pre-determined'. (We would probably use a
computer to obtain the output data though that is more incidental, and that
alone would not justify a process being termed a simulation.)
In this introduction, the reasons or motivations for undertaking simulation
studies (in the narrower senses) of information retrieval, in preference to
experimental studies, have not been discussed in detail. This is because they
are fully discussed in general terms in the standard texts on simulation (e.g.
Churchman8, Martin9, Gordon10), and because the reasons tend to carry
little conviction until one has actually undertaken a simulation study-at
least in the author's experience. But briefly it might be claimed that (a) the
simulation study itself requires that a formal representation of the system
(called a `model' by some writers) be arrived at, this itself giving valuable
insight into the system, (b) manipulation of data and paths within a
conceptual framework of a system (i.e. within a formal structure or model) is
more economical of effort, money and time than manipulation of the real
system so represented and measurement of data pertaining to it (i.e. than
experimentation on the system)-which of course begs the question of the
validity of the formal structure, and (c) the simulation suggests new areas for
observational (experimental and investigative) work predicated on the
validity of the constructions that it comprises.
In summary, simulation in its broadest sense is of interest to information
retrieval workers because of the very uncertainty of its definition; because it
provokes our interest in the conceptual roots of representation and transfer
of information (or should we say, of representation and the transfer ol'
representation[OCRerr]the main topics of what we call `information science'. This
is of concern in information retrieval experiments both because such
experiments appear to serve as a prototype for experiments on information
transfer construed more generally, and because more formal (theoretical)
study of information transfer may give insights into the process of information
retrieval as we usually regard it. Our experiments and investigations or
document transfer at the macroscopic `human' level are into instances Of
information transfer that are perhaps artificially circumscribed. On the other
hand, simulation in its narrowest sense, that of the representation of
randomness in relationships between people, documents, document attri-
butes, and logical expressions representing `information needs', is of interest
and value in challenging the terms and concepts we use, in distinguishin[OCRerr]
between tautological findings (i.e. findings simply a consequence of the
language of description) and findings that are not tautological (`scientific'
findings), and in suggesting new experiments consistent with conjectures
within the language of description. But truths suggested by a simulation
experiment are always suspect-in that the structure of the simulation is one
I