SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Combining Evidence from Multiple Searches
chapter
E. Fox
M. Koushik
J. Shaw
R. Modlin
D. Rao
National Institute of Standards and Technology
Donna K. Harman
Figure 1: Decision Tree Example for query 52
RULE[OCRerr]1 IF
COSINE-2 = [0.032,0.07)
THEN
RELEVANCE = 0 73.3[OCRerr]I.
RELEVANCE = 1 26.7[OCRerr]i.
RULE[OCRerr]2 IF
COSINE-2 = [0.07,0.195]
THEN
RELEVANCE = 0 9.SX
RELEVANCE = 1 90.5X
This indicates that the likelihood of relevance is about .27 for very low values from the second
cosine run, and about .91 for higher values from that same cosine run. When selecting this tree, the
Decision Tree method suggests that ranking solely based on this cosine run would be wise. More
complicated Decision Trees resulted for a number of queries, where several of the base runs' values
had to be consulted. Unfortunately, no full ranking of run results using the Decision Trees could
be completed in time for this report, so other, simpler methods were applied.
In Phase 1, a simple scheme was used for the results that were turned in. Essentially, the best
results from each of the runs were included, until 200 distinct documents were found, for each
query. This scheme is referred to as Ad Hoc Merge in discussions below.
In Phase 2, a more complex system was explored, called Recall-Precision (R-P) Merge. Details
and results are given in Section 6.
4 Systems description
The main machine used for the indexing and retrieval runs was a DECstation 5000 Model 25
with 40MB of RAM. This is a MIPS R3000 CPU running at 25MHz. The total disk space used
for the project was on the order of 3 GB.
5 Results of Phase 1
Due to limitations of disk space, only a subset of the collection comprising of Disc 1 of the Wall
Street Journal was used during Phase 1 experimental runs. Relevance judgements were performed
on a subset of this data by team members, in order to obtain a large set of training information.
These were compared with the NIST judgement data and showed very high correlation. (Almost
90% of the documents we judged relevant were judged relevant by NIST.) In any case, the NIST
judgments were used in the official (November 18, 1992) evaluation of our Phase 1 system, shown
in Figure 2.
323