IRE
Information Retrieval Experiment
Simulation, and simulation experiments
chapter
Michael D. Heine
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
Some previous work in simulation applied to information retrieval 193
month (from 1 to 100000), and item input rate per month (from 1 to 100000
also), to produce an estimated cost range of from $188 000 to $558 000 for one
system, and from $166 000 to $551 000 for a second system.)
Baker and Nance24 report on a study in which the `system' is defined more
generally-so as to include both the users and the funders of the retrieval
service-their point being that a more restricted view may lead to
suboptimization. (To optimize in respect of system response-time alone, or
in respect of unit retrieval cost alone, may be to ignore the costs (or disutility)
to the user entailed in (a) noisy (low-Precision) search output, and (b) actual
usage of the system, such costs being, possibly, the main causes of low system
usage or a poor reputation of it amongst users. (For a related, sceptical
viewpoint, see W. S. Cooper31.) Baker and Nance assume, accordingly, that
the funding and operating of a system must be seen as being influenced by
user costs and convenience, or utility. The relationships of interest to them
are portrayed in two detailed diagrams, and a table of descriptive content.
Although the first diagram is a general one, the second, and the table, are
relevant to a system having the form of a university departmental library, i.e.
to a highly specific system only. The model of the system that is given is
moreoever only indicative and no tangible results of the study are given or
appear to have been published since.
Reilly's report26 is unusual in that he was concerned with a single user and
a single type of service, an approach that the Swets model13' 14 can also be
interpreted as embodying. Reilly's study assumes that a user estimates both
the delivery time of a document from a document-delivery service, and the
utility of the service to him prior to making a request from the service. The
user's subsequent behaviour is then determined by the truth-values of the
inequalities: estimated service time <[OCRerr] need time, and actual service
time <[OCRerr] need time, the former being modified with each decision and system
response. Since the estimated service time is not an observable in an
operational system (though it could be in an experimental environment) the
model may not be acceptable to those who insist that simulation should deal
only in observables. But non-observables are perhaps acceptable if one can,
by assuming their existence and properties, successfully predict observable
outcomes using them-the proof of the pudding. Reilly's approach would
seem to bring retrieval work closer to the point where user/service interaction
is properly heeded and accounted for, as a basis for the fuller system
definition needed for the efficient management of information services. The
point is in fact made by Reilly (and is also implied in the Baker and Nance
paper) that integration of models of different areas of information supply is
essential although he d[OCRerr]es not attempt same. Three such areas or `levels' are
singled out by him in this connection: computer processing centre activities,
determination of user behaviour (his main concern), and the delivery of
documents. A further point in common between Reilly's report and Baker
and Nance's study is that `a library' should be treated as an information
system. Although this has been a commonplace idea in US writings for many
years (and is a basis of Salton's recent monograph) there is a still a regrettable
reluctance in the UK to view libraries (document supply systems) in the same
light as information retrieval services (document record supply systems),
notwithstanding the common problems each has and the strong interactions
that necessarily exist between them. Simulation studies, in offering an