Information Retrieval Experiment

IRE Information Retrieval Experiment An experiment: search strategy variations in SDI profiles chapter Lynn Evans Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 304 An experiment; search strategy variations in SDI profiles (1) The output for the boolean strategy B (the only strategy producing an unranked output) was counted (say x items), as were the number of relevance RI (say y) and relevance Rl/2 (say z) documents it contained. (2) For the remaining strategies in turn, starting from the top of the ranked output, the numbers of items taken to retrieve y relevance Ri and relevance Rl/2 documents were noted[OCRerr]say xi and x2 items respectively. If both x1 and x2 for a particular strategy are less than the boolean output x, then that strategy is performing better than the boolean strategy B for both RI and RI/2 documents. If x1 <X<X2, then that strategy is performing better than strategy B for Ri documents but not for R1/2 documents. And so on. This procedure gave a best strategy (or joint best strategies) for each query on each run. The main disadvantage of the method is that it is entirely dependent on the boolean output; if there is no boolean output or if the boolean output contains no relevance Ri or Rl/2 documents, there is no basis for comparison. Also, under the conditions of this experiment, a comparison is not valid if the boolean output is very large. Aggregating, from runs 1, 5 and 6, the results for the top 3 search strategies only for each query a ranking of search strategies was obtained (Table 14.6). This shows the percentage of times each search strategy occupied one of the top 3 positions. Incidentally, in the calculations, where, for example, two (or more) strategies were equal best they were both (all) ranked first but the next best was ranked third (or fourth, etc., as appropriate). That is, ajointly held first position was considered of equal merit to a uniquely held first position. It is seen from Table 14.6 that, by this boolean comparison method, clearly the two best strategies are respectively BW and GWC, with strategy TWC, not so decisively, third best. The least promising strategy according to this method is CG. It is interesting that in the ranked-output comparison based on normalized recall (p. 300), strategies GWC and TWC are practically indistinguishable whereas in the boolean comparison strategy GWC comes out better. A possible explanation for this difference was thought to lie in the methods of comparison the ranked-output method is essentially `neutral' but the boolean comparison, being based on the boolean output, may be more oriented towards those strategies which comprise term groups (e.g. GWC) rather than a single list of terms (e.g. TWC). However this is not confirmed by the relative positions of strategies CT (single list of terms) and CG (term groups) in the two comparisons. In addition to the disadvantages of the boolean comparison method mentioned above it is now thought that there may be more fundamental objections. Not only are strategies B and BW considered to be misleadingly rated relative to the other strategies (B too low, BW too high) but the whole concept of evaluating an optimum boolean (yes/no) output against a ranked output may be questionable[OCRerr]it is not comparing like with like. On the other hand one of the principal criticisms of the experiment might be said to have been the failure to develop a valid method for comparing optimum boolean outputs with ranked outputs. To use a less-than-optimum boolean statement as a weak filter in a first pass of the document collection and then ranking the resulting output by some weighting scheme is a useful experimental convenience but it does not correspond to the strategy BW in this experiment.