SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Combination of Multiple Searches
chapter
E. Fox
J. Shaw
National Institute of Standards and Technology
D. K. Harman
Table 7: Comparison of combination runs and the five individual
runs (Ad-hoc Topics 51-100).
f TI________ Average Precision It________ R-Precision ________]
[Run 11 AP-1 J WSJ-1 f AP-2 [OCRerr] WSJ-2 [[ AP-1 J WSJ-1 [OCRerr] AP-2 [OCRerr] WSJ-2 J
SV 0.2387 0.2203 0.2543 0.1503 0.2624 0.2616 0.2649 0.1744
LV 0.2435 0.2414 0.2664 0.1633 0.2672 0.2800 0.2704 0.1860
Pnl.0 0.2810 0.2941 0.3004 0.2206 0.2688 0.3221 0.3165 0.2367
Pnl.5 0.3122 0.3199 0.3332 0.2327 0.2976 0.3443 0.3412 0.2511
Pn2.0 0.3027 0.3217 0.3300 0.2325 0.2968 0.3470 0.3339 0.2442
CombMAX 0.2856 0.3205 0.3337 0.2343 0.3013 0.3484 0.3431 0.2449
CombMIN 0.2863 0.1924 0.3047 0.1308 0.3036 0.2214 0.2980 0.1395
CombSUM 0.3493 0.3605 0.3748 0.2752 0.3590 0.3767 0.3732 0.2851
CombANZ 0.3493 0.3367 0.3748 0.2465 0.3590 0.3517 0.3732 0.2590
CombMNZ 0.3059 0.3368 0.3516 0.2467 0.3175 0.3517 0.3578 0.2590
CombMED 0.2943 0.3204 0.3335 0.2328 0.2977 0.3444 0.3414 0.2518
than the combined boolean schemes did they experi-
ence improved retrieval performance when combining
different query methods. This differs from our results
in several ways. Most importantly, the stage at which
we combine the different methods differed: Belkin et al.
combined the query representations before performing
the actual retrieval, while we combined the similarity
values produced from retrieval on each method individ-
ually. The difference between the two methodologies
can best be demonstrated using the standard vector
space model: Belkin et al. combined by summing the
vector representations of each query, while our method
is analogous to summing the cosines of the angles be-
tween each vector and a document. It is easily shown
that the cosine of the angle between a document vec-
tor and a combined query vector, that is the sum of
two query vectors as in the Belkin et aL approach, is
not equal to the sum of the cosines between a docu-
ment vector and the two separate query vectors. Other
differences between the two methodologies include the
fact that our P-norm queries performed better on av-
erage than our natural language vector queries, with
exceptions on a per query basis. We used only one P-
norm query and modified the operator weights while
Belkin et aL used five different boolean queries. Fi-
nally, combining with five runs with equal weights ac-
tually improved performance over each individual run.
However, one common trend emerges from both exper-
iments: the more query representations considered, the
better the results.
4.3 Future Exploration
Planned future work includes studying the following:
* Individually weighting various methods' similarity
values when performing combination runs.
248
* Normalization methods to allow combination of
runs made with different weighting schemes.
* Extending the analysis to all combinations of three
and four retrieval runs.
* Considering more/different query types.
5 Acknowledgements
This research was supported in part by DARPA and by
PRC Inc. We also thank Russell Modlin, M. Prabhakar
Koushik and Durgesh Rao for their collaboration during
TREC-1.
References
[1] Belkin, N.J., Cool, C., Croft, W.B., Callan, J.P.
(1993, June). The Effect of Multiple Query Rep-
resentations on Information Retrieval Performance.
Proc. 15th Int'l Conf. on R[OCRerr]D in IR (SIGIR `93),
Pittsburgh, 339-346.
[2] Buckley, C. (1985, May) Implementation of the
SMART information retrieval system. Technical
Report 85-686, Cornell University, Department of
Computer Science.
[3] Fox, E.A. (1983, August). Extending the Boolean
and Vector Space Models of Information Retrieval
with P-Norm Queries and Multiple Concept Types.
Cornell University Department of Computer Science
dissertation.
[4] Fox, E.A., Koushik, M.P., Shaw, J., Modlin, R.,
Rao, D. (1993). Combining Evidence from Multiple
Searches. In The First Text REtrieval Conference