SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Combining Evidence for Information Retrieval
chapter
N. Belkin
P. Kantor
C. Cool
R. Quatrain
National Institute of Standards and Technology
D. K. Harman
1-way [OCRerr]2:w[OCRerr] 3-wa[OCRerr] 4-way 5-way fusion
0.1441 0[OCRerr][OCRerr].2[OCRerr]016[OCRerr] 0.2235 0.2361 0.2349 0.2042
0.1571 0[OCRerr][OCRerr]L82[OCRerr] 0.2304 0.2225
0.1121 9[OCRerr].2051 0.2102 0.2292
0.1589 0.1951 0.2043 0.2200
0.1378 0.1763 [OCRerr]79 0.2166
O[OCRerr]..2L13 0.2171: -
0[OCRerr].1[OCRerr]727 0.2172
0.1[OCRerr]683 0.1873:
0[OCRerr].1633 0.2116[OCRerr][OCRerr][OCRerr]
______ 0.1885 0.1934
0.14200.18640.2103 0.22490.23490.2042
For example, column two represents the ten possible ways of
choosing two groups of query formulations from the collection
of five groups. Each entry is an average over 25 topics.
Table 4. For ad hoc topics, average 11-point precision,
by group, for each combination of queries, and mean aver-
age precision for all groups at each level of combination.
-------------------------------------------
*[OCRerr] 1-way 2-way 3-way 4-way 5-way fusion
[OCRerr]1-way 1** 3** 3** 3** 5**
[OCRerr]2-way 24** 5** 6** 5** 9
3-way 22** 20** ***** 3.5** 11
4-way 22** 19** 19** 3** 12
5-way 22** 20** 21.5** 22** 15
fusion 20** 16 14 13 10
**= significant difference at p < .01, sign test
*= significant difference at p < .05, sign test
Read row with respect to column, e.g. 2-way performed better
than 1-way 24 out of 25 times, or 1-way performed better than
2-way 1 out of 25 times
Table 4a. Number of times, for average performance of
combinations for ad hoc topics, that one treatment per-
formed better than another.
The results presented in Tables 4 and 4a are based on
the average performance for the query formulations in any
one set. In Tables 5 and 5a, we present data on perfor-
mance, for ad hoc topics, when only the best query formula-
tion, or best combination of query formulations, for each
topic is used. These results are compared with the single 5-
way combination (which is the only combination possible
at this level with our data), and with the fusion results. It
is of some interest to note that the ranring of level of com-
bination is now very much different than that for average
performance, with 2-way and 3-way combination being sig-
nificanfly better than 1-way, 4-way, 5-way and fusion (see
Table Sa).
1-way 2-way I 3-way I 4-way I 5-way I fusion I
0.2712 I 0.3002 I 0.2959 I 0.2702 I 0.2350 I 0.2042 I
Table 5. For ad hoc topics, mean 1 1-point precision for
best-performing combination of queries for each topic.
1-way 2-way 3-way 4-way 5-way fusion
1-way _____ 6** 7* 13 17 21**
2-way 19** 16.5 20** 22** 24**
3-wa 18*8.5 20.5*. 22** 22**
4-way 12 5** 4.5*. 22.5** 20**
5-way 8 3** 3** 2.5** ______ 15
fusion 4** 1** 3** 5** 10
**= significant difference at p < .01, sign test
* = significant difference at p <[OCRerr] .05, sign test
Read row with respect to column, e.g. 2-way performed better
than 1-way 19 out of 25 times, or 1-way performed better than
2-way 6 out of 25 times
Table Sa. Number of times, for performance of best com-
binations for ad hoc topics, that one treatment performed
better than another.
3.4 Adaptive Combination: Ad hoc
Topics
Finally, to get an overall idea of how query combina-
tion in the ad hoc case worked, and to estimate whether tak-
ing account of the evidence of scaech performance could im-
prove subsequent performance, we compared performance of
simple combination of all five query formulations (comb 1)
with performance when only the best single query formula-
tion for each topic was used (best), with combination of all
five query formulations weighted according to the precision
at 100 documents retrieved, of each formulation (comby).
The results, reported in Tables 6 and 6a, show that there is
no significant difference between combi and best, but that
comby is significanfly better than comb 1. While formation
of comby would not be possible under the conditions of the
ad hoc [OCRerr]fl[OCRerr]EC task, these results are of interest because they
simulate the kind of operations that could be implemented
in a fully interactive interface to an IR system.
I combi best I comby I fusion I
0.2350 0.2712 0.2819 0.2042
combi = unweighted combination of all queries for each topic
best = best performing query for each topic
comby = weighted (by prec.@ 100 docs) combination of all
queries for each topic
Table 6. For ad hoc topics, mean 11-point precision for
four treatments.
In reading Tables 6 and 6a, note that the choice referred
to as "best" corresponds exacdy to the choice called "1-way"
in Table 5. However, it does not correspond to any of the
entries in the first column of Table 4. The entries in Table
39