IRE
Information Retrieval Experiment
Simulation, and simulation experiments
chapter
Michael D. Heine
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
r[OCRerr]I
192 Simulation, and simulation experiments
suggested limitations on the validity of a simulation study when very severely
limiting definitions are used. The third example showed how purely formal
constructions can usefully be discussed and compared in a particular context
using familiar information retrieval concepts, with no additional definition
and dealing only in observables.
10.3 Some previous work in simulation applied to informafion
retrieval
For reasons given in the introduction it is in principle impossible to delimit
the literature on simulation applied to information retrieval in a satisfactory
way. The modelling or representational element, so essential to simulation,
is often present in general discussions on retrieval that do not specifically
refer to simulation by name. Writers will frequently have used the `language'
of simulation without necessarily having used simulation techniques in the
narrower sense for exploring the relationships that they discuss, or without
having been particularly concerned with optimization of, or intervention in,
the process described. We shall adopt as our rather arbitrary criterion for
inclusion that systems be formally represented and that relationships between
system components be systematically explored using plausible or experimen-
tally-obtained values for the variables involved, with the methodological
emphasis on the manipulation of such data. The works meeting this criterion
appear to be few in number20-30. Gurk's paper20 is more an indicative
description of a prototype of an information retrieval system than a
description of a simulation of it. Useful comment directed at simulation
work in the general information retrieval context has been offered by
Chapman1 and Salton1 2, the latter's monograph reviewing the main models.
The paper by Bourne and Ford22 is concerned with the economics of
information retrieval systems. The objective was to estimate the operating
cost, and the amounts of equipment and personnel, needed over a given time-
period by several hardware information retrieval configurations. Their paper
makes the point that on the basis of known data, and a knowledge of the
gross characteristics of a proposed system, the costs that would be borne in
the future can be arrived at by solely manipulative means much more cheaply
than by actually building and testing the system, thus underlining one of the
basic reasons for undertaking simulation studies. The `known data' is
grouped by them under three headings: `Time and Cost Data' (wage rates,
costs of materials, equipment purchase and maintenance costs, stationery,
etc.), `Statements of Interrelationships' (e.g. item input rate per person,
search time per request), and `Constants' (e.g. amortization period of
purchased equipment, interest rate on borrowed capital). Bourne and Ford
comment appropriately that the credibility of their type of analysis depends
upon the accuracy and completeness of both the analysis of the proposed
system and the basic time and cost data, but perhaps they do not sufficiently
emphasize the vulnerability of such analyses to rapid technological
obsolescence. A further useful point brought out by them is that the sensitivity
of operating costs (say) or other measures of efficiency or effectiveness, can
be explored in a simulation study. (They quote data for annual expenditure
as a function of the two independent variables: number of searches per