SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Combining Evidence for Information Retrieval
chapter
N. Belkin
P. Kantor
C. Cool
R. Quatrain
National Institute of Standards and Technology
D. K. Harman
4 refer to combinations based upon the fixed groups. But,
the combination of groups which performs best for, say,
Topic 57, need not be the one which performs best for
Topic 72. In Tables 6 and 6a, the best possible combina-
tion is chosen for each topic individually. Note also that,
in the INQUERY system, the "unweighted sum" corre-
sponds to a symmetrical assignment of each weight to all
formulations.
I combi I
best
I combv
1-way 2-way 3-way 4-way 5-way fusion
1-way 3** 2** 1** 2** 8**
2-way 47** 5.5** 6** 5.5** 13**
3-way 48** 44.5** 9** 7** 18*
4-way 49** 44** 41** 8** 22.5
5-way 48** 44.5** 43** 42** 28
fusion 42** 37** 32* 27.5 22
I combi I best I comby I fusion I ** = significant difference at p <- .01, sign test
I 8 4** 115 I * = significant difference at p < .05, sign test
117 I I 9 I 21** I Read row with respect to colunm, e.g. 2-way performed better
I 21** 116 I I 20** I than 1-way 47 out of 50 times, or 1-way performed better than
I [OCRerr] I I 2-way3outof5o times
I fusion 110 I 4**
** = significant difference at p < .01, sign test
* - significant difference at p < .05, sign test
Read row with respect to column, e.g. comby performed better
than combi 21 times , or combi performed better than comby
4 times.
Table 6a. Number of times that one treatment for ad hoc
topics performed better than another.
3.5 Query Combination and Data Fusion
Results: Routing Topics
We ran further experiments on the routing queries,
analogous to those we used for the ad hoc queries. Our first
set of results shows the progressive effect of unweighted
combination of query formulations, by level of combina-
tion, when average performance at each level is considered
(tables 7 and 7a). Again, as for the ad hoc queries (tables 4
and 4a), there is a progressive, significant effect of level of
query combination. For the routing queries, data fusion ap-
pears to have a somewhat stronger effect than for ad hoc,
being significantly better than 1-, 2- and 3-way combina-
tion. It is of some interest to note that the overall level of
performance for routing topics is much higher than for the
ad hoc topics.
1-way 2-w[OCRerr]ay[OCRerr] 3-way 4-way 5-way fusion
0.1763 0.2311 0.2599 0.2619 0.2807
0.1890 0.2202 0.2503 0.2748
0.1684 0.2258 0.2603 0.2735
0.2025 0.2229 0.2314 0.2512
0.1793 0.2436 0.2415 0.2745
0.2364 0.2471
0.2388 0.2509
0.2160 0.2654
0.2149 0.2642
0.2338 0.2417
~
Each entry is an average over 50 topics.
Table 7. For routing topics, average 11-point precision,
by group, for each combination of queries, and mean aver-
age precision for all groups at each level of combination.
Table 7a. Number of times, for average performance of
combinations for routing topics, that one treatment per-
formed better than another.
As for the ad hoc topics, we then compared the results
of the best query formulation combinations for each level of
combination, with the unweighted 5-way combination, and
fusion results. As for the ad hoc queries, this gave us quite
a different ranking of levels of combination, with 3-way and
2-way combinations being significantly better than all oth-
ers, and 4-way being significanfly better than 5-way and fu-
sion (tables 8 and 8a).
I 1-way I 2-way I 3-way I 4-way I 5-way I fusion I
I 0.29311 0.31731 0.31991 0.30691 0.28071 0.26611
Table 8. For routing topics, mean 11-point precision for
best-performing combination of queries for each topic.
1-way 2-way 3-way 4-way 5-way fusion
1-way 8.5** 13.5** 22 29 36**
2-way 41.5** 20.5 34* 38** 39**
3-way 36.5** 29.5 37** 42** 45**
4-way 28 16* 13** 44** 40**
5-way 21 12** 8** 6** 28
fusion 14** 11** 5** 10** 22
** = significant difference at p <[OCRerr] .01, sign test
* = significant difference at p <.05, sign test
Read row with respect to column, e.g. 2-way performed better
than 1-way 41.5 times, or 1-way performed better than 2-way
8.5 times
Table 8a. Number of times, for performanGe of best com-
binations for routing topics, that one treatment performed
better than another.
3.6 Adaptive Combination: Routing
Topics
Finally, we wished to investigate the effectiveness of
progressively taking account of retrieval performance in
40