SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Combining Evidence from Multiple Searches chapter E. Fox M. Koushik J. Shaw R. Modlin D. Rao National Institute of Standards and Technology Donna K. Harman Table 4: Base Runs, Phase 2, Disc 1, 11-point averages Run/Collection AP DOE FR WSJ ZF cosine.atn 0.1138 0.0543 0.0259 0.2740 0.0813 cosine.nnn 0.1890 0.0330 0.0504 0.2184 0.0946 inner.atn 0.1241 0.0609 0.0405 0.3224 0.0888 inner.nnn 0.1478 0.0252 0.0108 0.1329 0.0101 pnorml.0 0.3006 0.0876 0.0727 0.3085 0.1448 Table 5: Base Runs + Similarity Merge Run/Collection AP DOE FR WSJ ZF Sim-Merge cosine.atn 0.1138 0.0543 0.0259 0.2740 0.813 0.1149 cosine.nnn 0.1890 0.0330 0.0504 0.2184 0.0946 0.1513 inner.atn 0.1241 0.0609 0.0405 0.3224 0.0888 0.1717 inner.nnn 0.1478 0.0252 0.0108 0.1329 0.0101 0.0075 pnorml.0 0.3006 0.0876 0.0727 0.3085 0.1448 0.1831 It should be noted that collection (except for WSJ, 6.2 Similarity Merge the p-norm run results were best in almost all situations for the given where inner.atn was slightly better). TREC evaluations were to be done for the entire Disc 1 contents, so it is necessary to combine the results of the 5 sub-collections into an overall Disc 1 evaluation. This is a difficult matter, since each collection has a different number of relevant documents, and each was indexed separately. In 1993 we expect to consider this problem in more detail. As a first solution to the problem we decided on the simplest possible approach - combine the results based on similarity. Thus, for a particular retrieval approach, we merged all of the documents retrieved from the 5 sub-collection runs, sorted the 1000 documents found for each query based on similarity, and returned the 200 with the highest similarity values. For convenience, Table 5 shows the data from Table 4, but with an added column to show the Similarity Merge results. Clearly, some improvement is needed in this collection merging process. One might use training based on the number of relevant documents in a collection, to predict a prior probability of finding a relevant document in that collection, and then use that to temper the similarity values. 6.3 Recall-Precision Merge Another type of merger involves combining the several retrieval runs for a given sub-collection. To improve upon the Ad Hoc Merge used in Phase 1, we elected to train the merging process using recall-precision results. Thus, we considered the retrospective case of using the recall-precision tables from our evaluations, to help determine which runs to draw from for merging. In particular, we use the following Recall-Precision Merge algorithm: 1. For each run to be merged, store the top 200 items on a stack, with the highest rank at the bottom of the stack. 325