IRE Information Retrieval Experiment An experiment: search strategy variations in SDI profiles chapter Lynn Evans Butterworth & Company Karen Sparck Jones All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, including photocopying and recording, without the written permission of the copyright holder, application for which should be addressed to the Publishers. Such written permission must also be obtained before any part of this publication is stored in a retrieval system of any nature. 302 An experiment: search strategy variations in SDI profiles In an attempt to obtain more detailed information the raw retrieval d[OCRerr]iti for run 5 were reworked using an additional two cutoff points (positions 1 aII(I 2 in the ranked outputs) and calculating both by averages of numbers a1'(l averages of ratios. In the former, the (overall) mean recall ratio is calcul[OCRerr]1tcLl by dividing the total of relevant items retrieved for all the queries by the totil of known relevant items for all the queries. In the latter, recall ratios [OCRerr]irc individually calculated for each query and the mean of these ratios represent[OCRerr] the overall system (or, in our case, strategy) recall ratio. It is not obviou[OCRerr]. which method of calculation is the better; only that if the number of relevaut items varies widely from query to query then the two methods may not givc the same results. Also, if the two methods do not point to the same conclusions, there must be some dubiety about drawing any conclusions [OCRerr]t all. Table 14.4 shows the results of these additional calculations on the run S data. There is agreement that TWC and GWC are the best two strategies but below that the relative positions of the search strategies are not constant. The most extreme difference concerns strategy GTWC for relevance R I documents[OCRerr]by the `averages of numbers' calculation GTWC is third best but by the `averages of ratios' it is in sixth position. Other less drastic differences are also apparent. Incidentally comparison of the `averages of numbers' results in Table 14.4 with the run 5 data in Table 14.3 confirms that the use of the two additional cutoff points in the calculation of normalized recall has not changed the relative positions of any of the search strategies. TABLE 14.4. Run 5 data-Ranking of search strategies by norrnalized recall (based on 11 cutoff points, averages of numbers and ratios) Order (?{ Relevance R] documents Relevance RI/2 documents merit Search strategy (normalized recall) Search strategy (normalized recall) Average of nos. Average of ratios Average of nos. A[OCRerr][OCRerr]rage of ratios TWC (49.3) TWC (54.3) GWC (41.5) GWC (47.7) 2 GWC (48.3) GWC (50.9) TWC (41.1) TWC (47.1) 3 GTWC(46.8) CRTW (50.7) CGW (39.1) CGW (46.7) 4 CRTW (44.7) CTW (49.7) CTW (38.7) CTW (45.6) 5 CTW (44.4) CGW (47.7) CRTW (38.0) CG (44.6) 6 CGW (44.4) GTWC(47.1) GTWC(37.8) CRTW (44.1) 7 Ce (40.9) CG (44.4) CC (36.5) CT (42.7) 8 CT (39.6) CT (43.6) CT (35.5) GTWC (40.5) The values given in Table 14.4 still lack statistical significance so the run 5 data were further analysed by pairing all the search strategies in turn and, using the normalized recall figures for the individual queries, the results tested for significant difference using the sign test. Table 14.5 shows which search strategies are significantly different at the 0.1, 1, 2, and 5 per cent levels. Where [OCRerr]> 0.05 the differences are treated as not significant. Perhaps with more confidence than before it can be concJuded that strategies TWC and GWC are the best two (but not distinguishable from each other) and that CT and CG are inferior (particularly when relevance Ri documents only are considered).