IRE
Information Retrieval Experiment
An experiment: search strategy variations in SDI profiles
chapter
Lynn Evans
Butterworth & Company
Karen Sparck Jones
All rights reserved. No part of this publication may be reproduced
or transmitted in any form or by any means, including photocopying
and recording, without the written permission of the copyright holder,
application for which should be addressed to the Publishers. Such
written permission must also be obtained before any part of this
publication is stored in a retrieval system of any nature.
304 An experiment; search strategy variations in SDI profiles
(1) The output for the boolean strategy B (the only strategy producing an
unranked output) was counted (say x items), as were the number of
relevance RI (say y) and relevance Rl/2 (say z) documents it contained.
(2) For the remaining strategies in turn, starting from the top of the ranked
output, the numbers of items taken to retrieve y relevance Ri and
relevance Rl/2 documents were noted[OCRerr]say xi and x2 items respectively.
If both x1 and x2 for a particular strategy are less than the boolean output
x, then that strategy is performing better than the boolean strategy B for both
RI and RI/2 documents. If x1 <X<X2, then that strategy is performing
better than strategy B for Ri documents but not for R1/2 documents. And so
on.
This procedure gave a best strategy (or joint best strategies) for each query
on each run. The main disadvantage of the method is that it is entirely
dependent on the boolean output; if there is no boolean output or if the
boolean output contains no relevance Ri or Rl/2 documents, there is no
basis for comparison. Also, under the conditions of this experiment, a
comparison is not valid if the boolean output is very large.
Aggregating, from runs 1, 5 and 6, the results for the top 3 search strategies
only for each query a ranking of search strategies was obtained (Table 14.6).
This shows the percentage of times each search strategy occupied one of the
top 3 positions. Incidentally, in the calculations, where, for example, two (or
more) strategies were equal best they were both (all) ranked first but the next
best was ranked third (or fourth, etc., as appropriate). That is, ajointly held
first position was considered of equal merit to a uniquely held first position.
It is seen from Table 14.6 that, by this boolean comparison method, clearly
the two best strategies are respectively BW and GWC, with strategy TWC,
not so decisively, third best. The least promising strategy according to this
method is CG. It is interesting that in the ranked-output comparison based
on normalized recall (p. 300), strategies GWC and TWC are practically
indistinguishable whereas in the boolean comparison strategy GWC comes
out better. A possible explanation for this difference was thought to lie in the
methods of comparison the ranked-output method is essentially `neutral'
but the boolean comparison, being based on the boolean output, may be
more oriented towards those strategies which comprise term groups (e.g.
GWC) rather than a single list of terms (e.g. TWC). However this is not
confirmed by the relative positions of strategies CT (single list of terms) and
CG (term groups) in the two comparisons.
In addition to the disadvantages of the boolean comparison method
mentioned above it is now thought that there may be more fundamental
objections. Not only are strategies B and BW considered to be misleadingly
rated relative to the other strategies (B too low, BW too high) but the whole
concept of evaluating an optimum boolean (yes/no) output against a ranked
output may be questionable[OCRerr]it is not comparing like with like. On the other
hand one of the principal criticisms of the experiment might be said to have
been the failure to develop a valid method for comparing optimum boolean
outputs with ranked outputs. To use a less-than-optimum boolean statement
as a weak filter in a first pass of the document collection and then ranking the
resulting output by some weighting scheme is a useful experimental
convenience but it does not correspond to the strategy BW in this experiment.