SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Combination of Multiple Searches
chapter
E. Fox
J. Shaw
National Institute of Standards and Technology
D. K. Harman
Table 5: Average Precision and Exact R-Precision for the five individual runs (Ad-hoc Topics 51-100).
Avera[OCRerr]e non-interpolated Precision
J{______ _______ Disk 1 _______ _______ff______ Disk 2 [OCRerr]Both1
[OCRerr] Run J{ AP J DOE [OCRerr] FR [OCRerr] WSJ [OCRerr] ZF [[ AP [OCRerr] FR J WSJ JZ[OCRerr]F[OCRerr]ūDisksJ
sV 0.2387 0.0605 0.0222 0.2203 0.1026 0.2543 0.0330 0.1503 0.0770 0.1418
LV 0.2435 0.0586 0.0302 0.2414 0.0864 0.2664 0.0324 0.1633 0.0753 0.1555
Pnl.0 0.2605 0.0658 0.0611 0.2941 0.1110 0.3004 0.0879 0.2206 0.1003 0.1988
Pnl.5 0.2939 0.0771 0.0639 0.3199 0.1278 0.3332 0.0878 0.2327 0.1065 0.2242
Pn2.0 0.2849 0.0847 0.0706 0.3217 0.1278 0.3300 0.0865 0.2325 0.1136 0.2250
CombSUM 0.3493 0.1001 0.0741 0.3605 0.1475 0.3748 0.0842 0.2752 0.1273 0.2620
Chg/Max 18.84% 18.18% 4.95% 12.06% 15.41% 12.48% -4.20% 18.26% 12.05% 16.44%
__________ Exact R-Precision _______ ______________ _______
[OCRerr]_______ _______ Disk 1 _______ II Disk 2 ________[OCRerr] Both
Run [[ AP [ DOE [OCRerr] FR [ WSJ [OCRerr] ZF II AP [ FR [ WSJ ] ZF 1' Disks J
sV 0.2624 0.0564 0.0183 0.2616 0.1180 0.2649 0.0202 0.1744 0.0922 0.2169
LV 0.2672 0.0493 0.0274 0.2800 0.0802 0.2704 0.0176 0.1860 0.0843 0.2311
Pnl.0 0.2688 0.0661 0.0533 0.3221 0.1123 0.3165 0.0971 0.2367 0.0969 0.2708
Pnl.5 0.2976 0.0762 0.0572 0.3443 0.1218 0.3412 0.1016 0.2511 0.1068 0.2962
Pn2.0 0.2968 0.0765 0.0654 0.3470 0.1254 0.3339 0.0820 0.2442 0.1158 0.3008
CombSUM 0.3590 0.0950 0.0619 0.3767 0.1357 0.3732 0.0887 0.2851 0.1216 0.3292
Chg/Max 20.63% 24.18% -5.35% 8.55% 8.21% 9.37% -12.69% 13.54% 5.00% 9.44%
Table 6. The rational behind the CombMIN combi-
nation method was to minimize the probability that a
non-relevant document would be highly ranked, while
the purpose of the CombMAX combination method was
to minimize the number of relevant documents being
poorly ranked. There is an inherent flaw with both of
these methods; namely, they are specialized to handle
specific problems without regard to their effect on the
other retrieved documents: for example, the CombMIN
combination method will promote the type of error that
the CombMAX method is designed to minimize, and
vice versa. The CombMED combination method is a
simplistic approach to handling this, using the median
similarity value to avoid both scenarios. What is clearly
needed is some method of considering the documents'
relative ranks, or similarity values, instead of simply
attempting to select a single similarity value from a set
of runs. To this end, we tried three other methods of
combining retrieval methods. CombSUM, the summa-
tion of the set of similarity values, or, equivalently, the
numerical mean of the set of the set of similarity val-
ues; CombANZ, the average of the non-zero similarity
values, that ignores the effects of a single given run
or query failing to retrieve a relevant document; and
CombMNZ to provide higher weights to documents re-
trieved by multiple retrieval methods. Clearly[OCRerr] there are
more possibilities to consider; the advantages of those
246
chosen are simplicity, in terms of both execution effi-
ciency and implementation, and generality, in terms of
not being specific to a given method or retrieval run.
These six methods were evaluated against the AP and
WSJ test collections for topics 51 through 100, combin-
mg the similarity values of each of the five individual
runs specified above. The results are shown in Table 7
below the results of each of the corresponding individ-
ual runs from Table 5. Note that while the CombMAX
runs performed well compared with most of the indi-
vidual runs, they did not do as well as the best of the
individual runs in most cases. The CombMIN runs per-
formed similarly for the AP collection, but performed
worse than every individual run for the WSJ collection.
The CombANZ runs and the CombMNZ runs both
performed better than the best of the individual runs,
with the CombMNZ runs performing only slightly bet-
ter than the combANZ runs for three of the four collec-
tions, and performing basically the same for the fourth.
The primary reason for the similar performance of the
two runs is that the two methods produce the same
ranked sequence of for all the documents retrieved by
all five individual runs. Thus, the
The CombSUM retrieval run was performed for each
of the nine collections on the two training CD-ROMs.
The results are shown in Table 5. Breaking this anal-
ysis down to a per topic basis in Table 11, it can be