SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Combining Evidence for Information Retrieval chapter N. Belkin P. Kantor C. Cool R. Quatrain National Institute of Standards and Technology D. K. Harman 4 refer to combinations based upon the fixed groups. But, the combination of groups which performs best for, say, Topic 57, need not be the one which performs best for Topic 72. In Tables 6 and 6a, the best possible combina- tion is chosen for each topic individually. Note also that, in the INQUERY system, the "unweighted sum" corre- sponds to a symmetrical assignment of each weight to all formulations. I combi I best I combv 1-way 2-way 3-way 4-way 5-way fusion 1-way 3** 2** 1** 2** 8** 2-way 47** 5.5** 6** 5.5** 13** 3-way 48** 44.5** 9** 7** 18* 4-way 49** 44** 41** 8** 22.5 5-way 48** 44.5** 43** 42** 28 fusion 42** 37** 32* 27.5 22 I combi I best I comby I fusion I ** = significant difference at p <- .01, sign test I 8 4** 115 I * = significant difference at p < .05, sign test 117 I I 9 I 21** I Read row with respect to colunm, e.g. 2-way performed better I 21** 116 I I 20** I than 1-way 47 out of 50 times, or 1-way performed better than I [OCRerr] I I 2-way3outof5o times I fusion 110 I 4** ** = significant difference at p < .01, sign test * - significant difference at p < .05, sign test Read row with respect to column, e.g. comby performed better than combi 21 times , or combi performed better than comby 4 times. Table 6a. Number of times that one treatment for ad hoc topics performed better than another. 3.5 Query Combination and Data Fusion Results: Routing Topics We ran further experiments on the routing queries, analogous to those we used for the ad hoc queries. Our first set of results shows the progressive effect of unweighted combination of query formulations, by level of combina- tion, when average performance at each level is considered (tables 7 and 7a). Again, as for the ad hoc queries (tables 4 and 4a), there is a progressive, significant effect of level of query combination. For the routing queries, data fusion ap- pears to have a somewhat stronger effect than for ad hoc, being significantly better than 1-, 2- and 3-way combina- tion. It is of some interest to note that the overall level of performance for routing topics is much higher than for the ad hoc topics. 1-way 2-w[OCRerr]ay[OCRerr] 3-way 4-way 5-way fusion 0.1763 0.2311 0.2599 0.2619 0.2807 0.1890 0.2202 0.2503 0.2748 0.1684 0.2258 0.2603 0.2735 0.2025 0.2229 0.2314 0.2512 0.1793 0.2436 0.2415 0.2745 0.2364 0.2471 0.2388 0.2509 0.2160 0.2654 0.2149 0.2642 0.2338 0.2417 ~ Each entry is an average over 50 topics. Table 7. For routing topics, average 11-point precision, by group, for each combination of queries, and mean aver- age precision for all groups at each level of combination. Table 7a. Number of times, for average performance of combinations for routing topics, that one treatment per- formed better than another. As for the ad hoc topics, we then compared the results of the best query formulation combinations for each level of combination, with the unweighted 5-way combination, and fusion results. As for the ad hoc queries, this gave us quite a different ranking of levels of combination, with 3-way and 2-way combinations being significantly better than all oth- ers, and 4-way being significanfly better than 5-way and fu- sion (tables 8 and 8a). I 1-way I 2-way I 3-way I 4-way I 5-way I fusion I I 0.29311 0.31731 0.31991 0.30691 0.28071 0.26611 Table 8. For routing topics, mean 11-point precision for best-performing combination of queries for each topic. 1-way 2-way 3-way 4-way 5-way fusion 1-way 8.5** 13.5** 22 29 36** 2-way 41.5** 20.5 34* 38** 39** 3-way 36.5** 29.5 37** 42** 45** 4-way 28 16* 13** 44** 40** 5-way 21 12** 8** 6** 28 fusion 14** 11** 5** 10** 22 ** = significant difference at p <[OCRerr] .01, sign test * = significant difference at p <.05, sign test Read row with respect to column, e.g. 2-way performed better than 1-way 41.5 times, or 1-way performed better than 2-way 8.5 times Table 8a. Number of times, for performanGe of best com- binations for routing topics, that one treatment performed better than another. 3.6 Adaptive Combination: Routing Topics Finally, we wished to investigate the effectiveness of progressively taking account of retrieval performance in 40