IRE Information Retrieval Experiment Simulation, and simulation experiments chapter Michael D. Heine Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 186 Simulation, and simulation experiments by CI, i.e. CI=E(125-25TG). Thus when delivery times all have the value 5 (the least effective library possible), CI has the value 0, and when delivery times are the shortest possible, CI has the value 100. A large sample of requests for documents is examined and data obtained on the delivery time for each document, for some given library connected to the document supply system. Details such as attributes of the document and the requestor are also noted. It is required to know how various policy-changes affecting the system are likely to influence its Cl-value, as part of a policy review of the system. To simulate the policy changes we can add a random number (strictly add the value of a random variable U, uniform in the interval (0,1)) to each set of numeric or qualitative values characterizing each document. The data set would then comprise (say) the values of variables describing the user, the document, a value for the delivery time, and the random number. (To do this, a file of SPSS data could be read by a simple FORTRAN or ALGOL program accessing a suitable program package, such as that of the Numerical Algorithms Group (NAG), outputting the enriched data to a new file.) The `IF' command of SPSS can then be used to reassign the value of TG on the basis of (a) the user and document data for the case concerned (as appropriate to the policy change of interest), (b) the value of U recorded for the case, and (c) a specified threshold value arrived at by examination of independent evidence. The variable XCI= 1 25[OCRerr]25*TG, is computed after this reassignment of TG (using the `COMPUTE' command). Lastly, the `STATISTICS' command of SPSS will yield a value for the mean value of XCI, which happens to be the value for Capability Index, CI, that we seek. For example suppose the policy option being considered is `obtain all requests for documents that are papers in serials and are not held by the library, and which are requested by users of status S, as follows: (1) as photocopies, and (2) from the interlending source J' and suppose that the independently obtained evidence is that in such cases 65 per cent of such requests are delivered in time TG = 4, the rest in time TG = 5. Then we would test for the appropriateness of each document in the sample to this policy option by using an IF statement to identify documents that were both photocopies of serial papers and requested by S-type users and, in addition, for which U< 0.65 was true. In those cases we would reassign to TG the value 4. If the document were a photocopy of a serial paper and requested by an S-type user but U<0.65 was false, TG would be given the value 5. Affer these reassignments, the value of CI would be calculated as usual, the new value of it indicating the likely effect of implementing the new policy option when all needs are considered. Examples of other policy options that might be considered are those of giving users direct access to other systems,extending the hours of opening of the local collection and extending the loan period for locally-owned documents. We note that the use of SPSS in this way assumes independence between certain random variables implied by the raw data- a reasonable assumption if no contrary evidence is available. Example 2 (Morse's model of browsing in relegating collections, as treated by Salton) This example offers a description of a system (document supply system, or document record supply system) which has the following properties: (1) the I