Information Retrieval Experiment

IRE Information Retrieval Experiment Simulation, and simulation experiments chapter Michael D. Heine Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. Examples of simulation models in information retrieval studies 191 (6) Plot the resulting surface, find the centre of mass, etc. The algorithm can be modified to delete (R, P) points that arise from combinations involving the all-negated elementary conjunct, as mentioned previously, in which case the grand total changes. We note also that it is doubtful if the algorithm could easily be implemented for N in excess of 4, due to the combinatorial explosion in number of distinct search expressions. (For N=5 this number is 232=4.3 x 10[OCRerr].) So what the algorithm produces is the probability distribution on the R-P outcome space when a user of a database selects a search expression arbitrarily, for some given set of search terms. We turn now to the second problem. Suppose that we restrict attention in the first place to the set of DWFs that each map the 16 elementary conjuncts to 16 distinct real numbers. These DWFs can be distinguished and classified by the rankings (permutations) of the elementary conjuncts that they effect, as previously mentioned, there being 16! classes for DWFs of this type. Our interest is in the hypothetical situation of an enquirer (1) choosing a DWF randomly from one of these classes, and (2) choosing a threshold value randomly from the values 1,2,3,. . . , 16. An algorithm to generate the probability distribution over the Recall-Precision outcome space for this situation (for this species of DWF, for a given instance of information need, and for a given query qua set of terms) is as follows: (1) Define a permutation of the elementary conjuncts. Set J equal to 1. Read G. (2) Define R= [OCRerr] r[OCRerr], F= Z[OCRerr]1=6[OCRerr]f Infer P (3) Put the resulting (R,P) co-ordinate into a cell of a grid defined over (0,1) x (0,1). (4) Increment J by 1. If J< 2N then go to step 2 else define a new permutation and go to step 2. (If no new permutation is possible go to the next step.) (5) Divide each total of co-ordinates in the cell grid by the grand total of (R,P) points, i.e. by 2N(2N!). (6) Plot the resulting surface, find the centre of mass, etc. The surface could be termed the `document weighting surface' of a set of terms. As noted before, permutations that throw the all-negated elementary conjunct to rank positions other than 1 might be discounted. Also, we could ignore (R,P) points arising from threshold values of 1 if we wished since, unless r[OCRerr] 0, this relates to retrieval of the entire database. (It is recognized as unlikely that an enquirer will require a Recall value of 1.) More complicated algorithms might be identified that produce the Recall-Precision probability surface for DWFs that only weakly order the elementary conjuncts. The three examples we have given are intended to represent widely different approaches to simulation in information retrieval study. The first was concerned with the `random number' simulation technique serving in that case to show how the effects of choosing different policy options affecting document delivery speed in 6ne type of information environment could be predicted and compared. The second example served to emphasize the necessity of imposing clear definitions on everyday words[OCRerr]in that case `browsing'-for simulation to be undertaken at all, and perhaps also