IRE Information Retrieval Experiment Simulation, and simulation experiments chapter Michael D. Heine Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. I 182 Simulation, and simulation experiments System 6: The preparing of documents by their authors, and the disseminating of documents. System 7: The transferring of messages by means other than documents. System 8: The transferring of messages (with Systems 6 and 7 as subsystems). Some of the above can, individually, be broken down into subsystems: System 8 obviously so, and System 5 into (for example) a purely record. manipulative subsystem, a logical subsystem, and an economic subsystem. Conversely, more general systems, of which the above could be regarded as components, can be defined. For example, combining Systems 2 and 4 provides one challenging problem here in that two parties: the indexer and the searcher, are attempting to predict the behaviour of the other, i.e. a `gaming' system is implied. One side of this has recently been treated by Maron and W. S. Cooper (and also treated by writers on automatic indexing). Cooper2 for example has seen indexing as a `thought-experiment' by the indexer, in which the query terms to be used by the enquirer are anticipated. The other side, the anticipation by the enquirer of indexing terms that will have been used by the indexer (often over a lengthy period of time) is implicit in work on query optimization-for example that of Ide, Rocchio and Salton reviewed by Salton3 and that of Barker, Veal and Wyatt4 and Vernimb5. Identification of the latter system (System 4) and the composite system has, the writer suggested, been inhibited by the continued use of the misleading phrase `relevance to a question' which has both obscured the concept of the question as a variable, and implied that relevance judgements are capable of being based, unambiguously, on a purely verbal construct6. An ideal would be the recognition of a generalized system taking in all of the above systems, as has been attempted in various semi-intuitive representations by some authors, e.g. Vickery7. Such generalized systems are usually represented as `circular' in form, in that the output of the system (documents) contributes to a corpus of recorded knowledge, in relationship to which new information needs are recognized. It is suggested that two consequences of the above discussion are: (1) Judgements as to the informational components and decision-making components of simulation take on a peculiar significance in information science, where such components are in fact the main conceptual targets of the science itself. To attempt to `simulate' relationships between such components introduces almost a paradoxical situation (we are studying by systems means what it is to be a system). The effect of this, it is suggested, is that we should avoid strict definitions of simulation, and be aware that resolution of the difficulties may provide us with the conceptual roots that we seek. (2) Information retrieval experiments in the conventional sense-the Cranfield sense-involving study of the analyses of the effect on retrieval performance of altering the database (e.g. the depth of indexing in records), or say the boolean expressions representing users' information needs, relate to a subsystem of a larger system of information flow-using the term `information' intuitively. This subsystem, the subsystem of `retrieval from a database', appears to be an amalgam of what we have labelled as Systems 2, 3, 4 and 5, and could be labelled System 9 say. We now contrast simulation in the narrower, mathematical sense, with investigation and experimentation, and briefly comment on the tern' I I