SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Combining Evidence for Information Retrieval chapter N. Belkin P. Kantor C. Cool R. Quatrain National Institute of Standards and Technology D. K. Harman 1-way [OCRerr]2:w[OCRerr] 3-wa[OCRerr] 4-way 5-way fusion 0.1441 0[OCRerr][OCRerr].2[OCRerr]016[OCRerr] 0.2235 0.2361 0.2349 0.2042 0.1571 0[OCRerr][OCRerr]L82[OCRerr] 0.2304 0.2225 0.1121 9[OCRerr].2051 0.2102 0.2292 0.1589 0.1951 0.2043 0.2200 0.1378 0.1763 [OCRerr]79 0.2166 O[OCRerr]..2L13 0.2171: - 0[OCRerr].1[OCRerr]727 0.2172 0.1[OCRerr]683 0.1873: 0[OCRerr].1633 0.2116[OCRerr][OCRerr][OCRerr] ______ 0.1885 0.1934 0.14200.18640.2103 0.22490.23490.2042 For example, column two represents the ten possible ways of choosing two groups of query formulations from the collection of five groups. Each entry is an average over 25 topics. Table 4. For ad hoc topics, average 11-point precision, by group, for each combination of queries, and mean aver- age precision for all groups at each level of combination. ------------------------------------------- *[OCRerr] 1-way 2-way 3-way 4-way 5-way fusion [OCRerr]1-way 1** 3** 3** 3** 5** [OCRerr]2-way 24** 5** 6** 5** 9 3-way 22** 20** ***** 3.5** 11 4-way 22** 19** 19** 3** 12 5-way 22** 20** 21.5** 22** 15 fusion 20** 16 14 13 10 **= significant difference at p < .01, sign test *= significant difference at p < .05, sign test Read row with respect to column, e.g. 2-way performed better than 1-way 24 out of 25 times, or 1-way performed better than 2-way 1 out of 25 times Table 4a. Number of times, for average performance of combinations for ad hoc topics, that one treatment per- formed better than another. The results presented in Tables 4 and 4a are based on the average performance for the query formulations in any one set. In Tables 5 and 5a, we present data on perfor- mance, for ad hoc topics, when only the best query formula- tion, or best combination of query formulations, for each topic is used. These results are compared with the single 5- way combination (which is the only combination possible at this level with our data), and with the fusion results. It is of some interest to note that the ranring of level of com- bination is now very much different than that for average performance, with 2-way and 3-way combination being sig- nificanfly better than 1-way, 4-way, 5-way and fusion (see Table Sa). 1-way 2-way I 3-way I 4-way I 5-way I fusion I 0.2712 I 0.3002 I 0.2959 I 0.2702 I 0.2350 I 0.2042 I Table 5. For ad hoc topics, mean 1 1-point precision for best-performing combination of queries for each topic. 1-way 2-way 3-way 4-way 5-way fusion 1-way _____ 6** 7* 13 17 21** 2-way 19** 16.5 20** 22** 24** 3-wa 18*8.5 20.5*. 22** 22** 4-way 12 5** 4.5*. 22.5** 20** 5-way 8 3** 3** 2.5** ______ 15 fusion 4** 1** 3** 5** 10 **= significant difference at p < .01, sign test * = significant difference at p <[OCRerr] .05, sign test Read row with respect to column, e.g. 2-way performed better than 1-way 19 out of 25 times, or 1-way performed better than 2-way 6 out of 25 times Table Sa. Number of times, for performance of best com- binations for ad hoc topics, that one treatment performed better than another. 3.4 Adaptive Combination: Ad hoc Topics Finally, to get an overall idea of how query combina- tion in the ad hoc case worked, and to estimate whether tak- ing account of the evidence of scaech performance could im- prove subsequent performance, we compared performance of simple combination of all five query formulations (comb 1) with performance when only the best single query formula- tion for each topic was used (best), with combination of all five query formulations weighted according to the precision at 100 documents retrieved, of each formulation (comby). The results, reported in Tables 6 and 6a, show that there is no significant difference between combi and best, but that comby is significanfly better than comb 1. While formation of comby would not be possible under the conditions of the ad hoc [OCRerr]fl[OCRerr]EC task, these results are of interest because they simulate the kind of operations that could be implemented in a fully interactive interface to an IR system. I combi best I comby I fusion I 0.2350 0.2712 0.2819 0.2042 combi = unweighted combination of all queries for each topic best = best performing query for each topic comby = weighted (by prec.@ 100 docs) combination of all queries for each topic Table 6. For ad hoc topics, mean 11-point precision for four treatments. In reading Tables 6 and 6a, note that the choice referred to as "best" corresponds exacdy to the choice called "1-way" in Table 5. However, it does not correspond to any of the entries in the first column of Table 4. The entries in Table 39