SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Combining Evidence for Information Retrieval
chapter
N. Belkin
P. Kantor
C. Cool
R. Quatrain
National Institute of Standards and Technology
D. K. Harman
modification of the query formulation. To do this, we
compared performance of unweighted 5-way query combina-
tion (combi) with performance using the best- performing
query formulations in the training database (1)estl), the best
performing query formulations in the test database (bes[OCRerr]),
the weighted 5-way query combination using weights from
the training database (combx), the weighted 5-way query
combination using weights from the test database (comby),
and 5-way query combination weighted by the mean of the
weights for test and training databases. The weights that
we used were the precision at 100 retrieved documents for
each query formulation. In the official results, we used av-
erage 11-point precision. The reason for the change, is that
precision at some cutoff level is a realistic measure for the
routing task in general, and especially in an operational en-
vironment, whereas the average precision is a measure that
we cannot realistically expect to have in an operational en-
vironment. When we compared the performance of both
weights in the combx formulation, there was no significant
difference. The results are presented in tables 9 and 9a, and
show that talting account of subsequent evidence has a posi-
tive and significant effect on performance. When reading
Tables 9 and 9a, note that the entries for combi and fusion
have already appeared in Table 7, as 1'5-way11 and "fusion",
respectively. Also, "best2" has already appeared in Table 8,
as the best `11-way't combination.
comi bestl best2 comx comy cxv fus
comi [OCRerr]29 21 13.5 16* 14.5 28
** **
besti 21 13** 14** 14** 12.5 22.5
**
best2 29 37** 23 20 23 36**
comx 36.5 36** 27 21.5 18** 40**
**
comy 34* 36** 30 28.5 25.5 36.5
**
cxy 35.5 37.5 27 32* 24.5 37**
** **
fus 22 27.5 14** 10** 13.5 13**
**
** = significant difference at p < .01, sign test
* = significant difference at p < .05, sign test
Read row with respect to column, e.g. combx performed better
than combi 36.5 times, or combi better than combx 13.5
times.
Table 9a. Number of times that one treatment for routing
topics performed better than another.
I combl I besti I best2 I combx I comby I comxy I fusion I
.2807 .2721 .2931 .3012 .3090 .3068 .2661
combi = unweighted combination of all queries for each topic
besti = best performing query (on training set) for each topic
bes[OCRerr] = best performing query (on test set) for each topic
combx = weighted (by prec.@ 100 docs in training set) combi-
nation of all queries for each topic
comby = weighted (by prec.@ 100 docs in test set) combination
of all queries for each topic
combxy = weighted (by mean of the sum of prec.@ 100 docs in
training and test sets) combination of all queries for each topic
Table 9. For routing topics, mean 11-point precision for
seven treatments.
Table 9a encapsulates all of the key concepts of the
several approaches to combination that we have explored.
We have two approaches which are a priori and symmetric
in their treatment of the query formulations (fus and
combi). As expected, the fusion system, using the least
information, performs worse. comb 1, the symmetric for-
mulations does better, although the difference is not statis-
tically significant. Both of these methods often perform
better than the best of the individual formulations, and their
relations to other combination schemes are (except for the
relation to bes[OCRerr]) quite similar. The query [OCRerr]hat performs
best on the training set (besti) does not perform signifi-
cantly better than any of the combination schemes. But
that formulation which performs best on the test set (best2,
also called 1-way in Table 8) is significantly better than
besti and the fusion scheme.
41
Of greater interest are the methods representing adaptive
weighting schemes: combx, comby and combxy. Most
significantly, combx, the adaptive weighting formulation,
is better than the symmetrically weighted combination
(comb 1), the fusion rule, and the best single formulation in
a substantial fraction (over 70%) of all cases. The weight-
ing based on the test set (comby) stands also in essentially
the same relation to those three other schemes. Finally, the
weighting scheme combxy simulates a situation which
might arise in updating or tuning a combination rule after
two batches of documents have been retrieved. This is ac-
complished by averaging the weights assigned to each for-
mulation in the training run, with those assigned based on
the test run. This scheme shows essentially the same pro-
file as combx and comby when compared with the comb 1,
fusion, besti and bes[OCRerr] schemes. It performs significantly
better than combx, but not significantly better than comby.
4. Discussion
4.1 General Results
As is customary, we begin this section with a general
disclaimer. In this case, we need to point out that all of our
results were obtained with a very specific kind of query
formulation technique and very special kinds of queries, and,
that all of our results were obtained within a very special re-
trieval context, the INQUERY system. It is certainly p05-
sible that these circumstances strongly affected our results,
so that we cannot make widely general claims for them.
On the other hand, the results reported by Fox and Shaw
(this volume), using queries generated in quite different
ways, and using a quite different IR system and retrieval
technique, are quite similar in general form and trend to