SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Combining Evidence from Multiple Searches
chapter
E. Fox
M. Koushik
J. Shaw
R. Modlin
D. Rao
National Institute of Standards and Technology
Donna K. Harman
Table 4: Base Runs, Phase 2, Disc 1, 11-point averages
Run/Collection AP DOE FR WSJ ZF
cosine.atn 0.1138 0.0543 0.0259 0.2740 0.0813
cosine.nnn 0.1890 0.0330 0.0504 0.2184 0.0946
inner.atn 0.1241 0.0609 0.0405 0.3224 0.0888
inner.nnn 0.1478 0.0252 0.0108 0.1329 0.0101
pnorml.0 0.3006 0.0876 0.0727 0.3085 0.1448
Table 5: Base Runs + Similarity Merge
Run/Collection AP DOE FR WSJ ZF Sim-Merge
cosine.atn 0.1138 0.0543 0.0259 0.2740 0.813 0.1149
cosine.nnn 0.1890 0.0330 0.0504 0.2184 0.0946 0.1513
inner.atn 0.1241 0.0609 0.0405 0.3224 0.0888 0.1717
inner.nnn 0.1478 0.0252 0.0108 0.1329 0.0101 0.0075
pnorml.0 0.3006 0.0876 0.0727 0.3085 0.1448 0.1831
It should be noted that
collection (except for WSJ,
6.2 Similarity Merge
the p-norm run results were best in almost all situations for the given
where inner.atn was slightly better).
TREC evaluations were to be done for the entire Disc 1 contents, so it is necessary to combine the
results of the 5 sub-collections into an overall Disc 1 evaluation. This is a difficult matter, since
each collection has a different number of relevant documents, and each was indexed separately. In
1993 we expect to consider this problem in more detail.
As a first solution to the problem we decided on the simplest possible approach - combine
the results based on similarity. Thus, for a particular retrieval approach, we merged all of the
documents retrieved from the 5 sub-collection runs, sorted the 1000 documents found for each
query based on similarity, and returned the 200 with the highest similarity values.
For convenience, Table 5 shows the data from Table 4, but with an added column to show the
Similarity Merge results.
Clearly, some improvement is needed in this collection merging process. One might use training
based on the number of relevant documents in a collection, to predict a prior probability of finding
a relevant document in that collection, and then use that to temper the similarity values.
6.3 Recall-Precision Merge
Another type of merger involves combining the several retrieval runs for a given sub-collection. To
improve upon the Ad Hoc Merge used in Phase 1, we elected to train the merging process using
recall-precision results. Thus, we considered the retrospective case of using the recall-precision
tables from our evaluations, to help determine which runs to draw from for merging.
In particular, we use the following Recall-Precision Merge algorithm:
1. For each run to be merged, store the top 200 items on a stack, with the highest rank at the
bottom of the stack.
325