IRE
Information Retrieval Experiment
Simulation, and simulation experiments
chapter
Michael D. Heine
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
186 Simulation, and simulation experiments
by CI, i.e. CI=E(125-25TG). Thus when delivery times all have the value 5
(the least effective library possible), CI has the value 0, and when delivery
times are the shortest possible, CI has the value 100. A large sample of
requests for documents is examined and data obtained on the delivery time
for each document, for some given library connected to the document supply
system. Details such as attributes of the document and the requestor are also
noted. It is required to know how various policy-changes affecting the system
are likely to influence its Cl-value, as part of a policy review of the system. To
simulate the policy changes we can add a random number (strictly add the
value of a random variable U, uniform in the interval (0,1)) to each set of
numeric or qualitative values characterizing each document. The data set
would then comprise (say) the values of variables describing the user, the
document, a value for the delivery time, and the random number. (To do this,
a file of SPSS data could be read by a simple FORTRAN or ALGOL
program accessing a suitable program package, such as that of the Numerical
Algorithms Group (NAG), outputting the enriched data to a new file.) The
`IF' command of SPSS can then be used to reassign the value of TG on the
basis of (a) the user and document data for the case concerned (as appropriate
to the policy change of interest), (b) the value of U recorded for the case, and
(c) a specified threshold value arrived at by examination of independent
evidence. The variable XCI= 1 25[OCRerr]25*TG, is computed after this reassignment
of TG (using the `COMPUTE' command). Lastly, the `STATISTICS'
command of SPSS will yield a value for the mean value of XCI, which
happens to be the value for Capability Index, CI, that we seek. For example
suppose the policy option being considered is `obtain all requests for
documents that are papers in serials and are not held by the library, and
which are requested by users of status S, as follows: (1) as photocopies, and
(2) from the interlending source J' and suppose that the independently
obtained evidence is that in such cases 65 per cent of such requests are
delivered in time TG = 4, the rest in time TG = 5. Then we would test for the
appropriateness of each document in the sample to this policy option by
using an IF statement to identify documents that were both photocopies of
serial papers and requested by S-type users and, in addition, for which U<
0.65 was true. In those cases we would reassign to TG the value 4. If the
document were a photocopy of a serial paper and requested by an S-type user
but U<0.65 was false, TG would be given the value 5. Affer these
reassignments, the value of CI would be calculated as usual, the new value of
it indicating the likely effect of implementing the new policy option when all
needs are considered. Examples of other policy options that might be
considered are those of giving users direct access to other systems,extending
the hours of opening of the local collection and extending the loan period for
locally-owned documents. We note that the use of SPSS in this way assumes
independence between certain random variables implied by the raw data-
a reasonable assumption if no contrary evidence is available.
Example 2 (Morse's model of browsing in relegating collections, as treated by
Salton)
This example offers a description of a system (document supply system, or
document record supply system) which has the following properties: (1) the
I