IRE
Information Retrieval Experiment
Simulation, and simulation experiments
chapter
Michael D. Heine
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
I
182 Simulation, and simulation experiments
System 6: The preparing of documents by their authors, and the
disseminating of documents.
System 7: The transferring of messages by means other than documents.
System 8: The transferring of messages (with Systems 6 and 7 as
subsystems).
Some of the above can, individually, be broken down into subsystems:
System 8 obviously so, and System 5 into (for example) a purely record.
manipulative subsystem, a logical subsystem, and an economic subsystem.
Conversely, more general systems, of which the above could be regarded as
components, can be defined. For example, combining Systems 2 and 4
provides one challenging problem here in that two parties: the indexer and
the searcher, are attempting to predict the behaviour of the other, i.e. a
`gaming' system is implied. One side of this has recently been treated by
Maron and W. S. Cooper (and also treated by writers on automatic indexing).
Cooper2 for example has seen indexing as a `thought-experiment' by the
indexer, in which the query terms to be used by the enquirer are anticipated.
The other side, the anticipation by the enquirer of indexing terms that will
have been used by the indexer (often over a lengthy period of time) is implicit
in work on query optimization-for example that of Ide, Rocchio and Salton
reviewed by Salton3 and that of Barker, Veal and Wyatt4 and Vernimb5.
Identification of the latter system (System 4) and the composite system has,
the writer suggested, been inhibited by the continued use of the misleading
phrase `relevance to a question' which has both obscured the concept of the
question as a variable, and implied that relevance judgements are capable of
being based, unambiguously, on a purely verbal construct6. An ideal would
be the recognition of a generalized system taking in all of the above systems,
as has been attempted in various semi-intuitive representations by some
authors, e.g. Vickery7. Such generalized systems are usually represented as
`circular' in form, in that the output of the system (documents) contributes to
a corpus of recorded knowledge, in relationship to which new information
needs are recognized.
It is suggested that two consequences of the above discussion are: (1)
Judgements as to the informational components and decision-making
components of simulation take on a peculiar significance in information
science, where such components are in fact the main conceptual targets of
the science itself. To attempt to `simulate' relationships between such
components introduces almost a paradoxical situation (we are studying by
systems means what it is to be a system). The effect of this, it is suggested, is
that we should avoid strict definitions of simulation, and be aware that
resolution of the difficulties may provide us with the conceptual roots that we
seek. (2) Information retrieval experiments in the conventional sense-the
Cranfield sense-involving study of the analyses of the effect on retrieval
performance of altering the database (e.g. the depth of indexing in records),
or say the boolean expressions representing users' information needs, relate
to a subsystem of a larger system of information flow-using the term
`information' intuitively. This subsystem, the subsystem of `retrieval from a
database', appears to be an amalgam of what we have labelled as Systems 2,
3, 4 and 5, and could be labelled System 9 say.
We now contrast simulation in the narrower, mathematical sense, with
investigation and experimentation, and briefly comment on the tern'
I
I