SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Combining Evidence from Multiple Searches chapter E. Fox M. Koushik J. Shaw R. Modlin D. Rao National Institute of Standards and Technology Donna K. Harman Figure 1: Decision Tree Example for query 52 RULE[OCRerr]1 IF COSINE-2 = [0.032,0.07) THEN RELEVANCE = 0 73.3[OCRerr]I. RELEVANCE = 1 26.7[OCRerr]i. RULE[OCRerr]2 IF COSINE-2 = [0.07,0.195] THEN RELEVANCE = 0 9.SX RELEVANCE = 1 90.5X This indicates that the likelihood of relevance is about .27 for very low values from the second cosine run, and about .91 for higher values from that same cosine run. When selecting this tree, the Decision Tree method suggests that ranking solely based on this cosine run would be wise. More complicated Decision Trees resulted for a number of queries, where several of the base runs' values had to be consulted. Unfortunately, no full ranking of run results using the Decision Trees could be completed in time for this report, so other, simpler methods were applied. In Phase 1, a simple scheme was used for the results that were turned in. Essentially, the best results from each of the runs were included, until 200 distinct documents were found, for each query. This scheme is referred to as Ad Hoc Merge in discussions below. In Phase 2, a more complex system was explored, called Recall-Precision (R-P) Merge. Details and results are given in Section 6. 4 Systems description The main machine used for the indexing and retrieval runs was a DECstation 5000 Model 25 with 40MB of RAM. This is a MIPS R3000 CPU running at 25MHz. The total disk space used for the project was on the order of 3 GB. 5 Results of Phase 1 Due to limitations of disk space, only a subset of the collection comprising of Disc 1 of the Wall Street Journal was used during Phase 1 experimental runs. Relevance judgements were performed on a subset of this data by team members, in order to obtain a large set of training information. These were compared with the NIST judgement data and showed very high correlation. (Almost 90% of the documents we judged relevant were judged relevant by NIST.) In any case, the NIST judgments were used in the official (November 18, 1992) evaluation of our Phase 1 system, shown in Figure 2. 323