SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Combination of Multiple Searches chapter E. Fox J. Shaw National Institute of Standards and Technology D. K. Harman Table 5: Average Precision and Exact R-Precision for the five individual runs (Ad-hoc Topics 51-100). Avera[OCRerr]e non-interpolated Precision J{______ _______ Disk 1 _______ _______ff______ Disk 2 [OCRerr]Both1 [OCRerr] Run J{ AP J DOE [OCRerr] FR [OCRerr] WSJ [OCRerr] ZF [[ AP [OCRerr] FR J WSJ JZ[OCRerr]F[OCRerr]ūDisksJ sV 0.2387 0.0605 0.0222 0.2203 0.1026 0.2543 0.0330 0.1503 0.0770 0.1418 LV 0.2435 0.0586 0.0302 0.2414 0.0864 0.2664 0.0324 0.1633 0.0753 0.1555 Pnl.0 0.2605 0.0658 0.0611 0.2941 0.1110 0.3004 0.0879 0.2206 0.1003 0.1988 Pnl.5 0.2939 0.0771 0.0639 0.3199 0.1278 0.3332 0.0878 0.2327 0.1065 0.2242 Pn2.0 0.2849 0.0847 0.0706 0.3217 0.1278 0.3300 0.0865 0.2325 0.1136 0.2250 CombSUM 0.3493 0.1001 0.0741 0.3605 0.1475 0.3748 0.0842 0.2752 0.1273 0.2620 Chg/Max 18.84% 18.18% 4.95% 12.06% 15.41% 12.48% -4.20% 18.26% 12.05% 16.44% __________ Exact R-Precision _______ ______________ _______ [OCRerr]_______ _______ Disk 1 _______ II Disk 2 ________[OCRerr] Both Run [[ AP [ DOE [OCRerr] FR [ WSJ [OCRerr] ZF II AP [ FR [ WSJ ] ZF 1' Disks J sV 0.2624 0.0564 0.0183 0.2616 0.1180 0.2649 0.0202 0.1744 0.0922 0.2169 LV 0.2672 0.0493 0.0274 0.2800 0.0802 0.2704 0.0176 0.1860 0.0843 0.2311 Pnl.0 0.2688 0.0661 0.0533 0.3221 0.1123 0.3165 0.0971 0.2367 0.0969 0.2708 Pnl.5 0.2976 0.0762 0.0572 0.3443 0.1218 0.3412 0.1016 0.2511 0.1068 0.2962 Pn2.0 0.2968 0.0765 0.0654 0.3470 0.1254 0.3339 0.0820 0.2442 0.1158 0.3008 CombSUM 0.3590 0.0950 0.0619 0.3767 0.1357 0.3732 0.0887 0.2851 0.1216 0.3292 Chg/Max 20.63% 24.18% -5.35% 8.55% 8.21% 9.37% -12.69% 13.54% 5.00% 9.44% Table 6. The rational behind the CombMIN combi- nation method was to minimize the probability that a non-relevant document would be highly ranked, while the purpose of the CombMAX combination method was to minimize the number of relevant documents being poorly ranked. There is an inherent flaw with both of these methods; namely, they are specialized to handle specific problems without regard to their effect on the other retrieved documents: for example, the CombMIN combination method will promote the type of error that the CombMAX method is designed to minimize, and vice versa. The CombMED combination method is a simplistic approach to handling this, using the median similarity value to avoid both scenarios. What is clearly needed is some method of considering the documents' relative ranks, or similarity values, instead of simply attempting to select a single similarity value from a set of runs. To this end, we tried three other methods of combining retrieval methods. CombSUM, the summa- tion of the set of similarity values, or, equivalently, the numerical mean of the set of the set of similarity val- ues; CombANZ, the average of the non-zero similarity values, that ignores the effects of a single given run or query failing to retrieve a relevant document; and CombMNZ to provide higher weights to documents re- trieved by multiple retrieval methods. Clearly[OCRerr] there are more possibilities to consider; the advantages of those 246 chosen are simplicity, in terms of both execution effi- ciency and implementation, and generality, in terms of not being specific to a given method or retrieval run. These six methods were evaluated against the AP and WSJ test collections for topics 51 through 100, combin- mg the similarity values of each of the five individual runs specified above. The results are shown in Table 7 below the results of each of the corresponding individ- ual runs from Table 5. Note that while the CombMAX runs performed well compared with most of the indi- vidual runs, they did not do as well as the best of the individual runs in most cases. The CombMIN runs per- formed similarly for the AP collection, but performed worse than every individual run for the WSJ collection. The CombANZ runs and the CombMNZ runs both performed better than the best of the individual runs, with the CombMNZ runs performing only slightly bet- ter than the combANZ runs for three of the four collec- tions, and performing basically the same for the fourth. The primary reason for the similar performance of the two runs is that the two methods produce the same ranked sequence of for all the documents retrieved by all five individual runs. Thus, the The CombSUM retrieval run was performed for each of the nine collections on the two training CD-ROMs. The results are shown in Table 5. Breaking this anal- ysis down to a per topic basis in Table 11, it can be