SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Description of the PRC CEO Algorithm for TREC
chapter
P. Thompson
National Institute of Standards and Technology
Donna K. Harman
way to weight the contributions of the models in the combined probability distribution used
to rank the retrieved documents. Since various algorithms, such as p-norm, are expressed
in terms of correlations rather than probability distributions, it was necessary to extend the
CEO algorithm to handle correlations. So far this extension has been handled in a heuristic
fashion. If a retrieval method, e.g., one of the cosine methods, returned a value between 0
and 1 as a retrieval status value; the logistic transformation of this weight was interpreted as
an estimate of the mean of a logistically transformed beta distribution which was provided
as evidence to the decision maker. Since there was no basis with which to assign a
standard deviation to this distribution, as called for by the CEO methodology, an
assumption was made that all standard deviations were .4045, a value corresponding to a
standard deviation of .1 in terms of probabilities.
All of the retrieval methods used by VPI&SU were combined with the CEO algorithm
except for the Boolean. That is, we used weighted and unweighted cosine and inner
product measures as well as p-norm measures of 1.0, 1.5, and 2.0. For measures, such as
the inner product and some of the p-norm results that did not give a retrieval status value in
the 0 to 1 range, the result was mapped to this interval by scaling the highest score of the
method in question for a given topic to the highest score given by one of the cosine
measures. Default scores half way between 0 and the lowest score achieved by a particular
method were used for documents not retrieved in the top 200 in response to a given topic,
since the actual score of these documents was unknown. The Boolean model was not
included, because it was not a ranked retrieval method. In the future we plan to extend our
normalization techniques to use the Boolean results as well. Figure 1 shows our summary
official TREC results for topics 51-100 on the Wall Street Journal collection from the first
CD-ROM.
Since TREC, we have experimented with weighting the different methods combined based
on their performance with the TREC data. In other words we have attempted to determine
an upper bound for performance based on knowledge of each method's performance on the
actual test data. We used four different weighting schemes: the 11-point average,
precision at 0.00 recall, precision at 0.10 recall, and unweighted (i.e., our official TREC
results). We also tried using the five best methods, rather than the seven used for our
official results, i.e., we excluded Pnorml.5 and Pnorm2.0. None of the weights produced
better results than the unweighted scheme. This was surprising. Figure 2 shows our
summary results for CEO based on all seven methods using the 11-point average and
additional relevance judgments made by VPI&SU. Figure 3 shows the same weighting
scheme using only NIST relevance judgments.
Two immediate explanations suggest themselves. First, using overall averages may not be
too useful. Second, our simple implementation of CEO assumes independence among the
methods. To examine the first problem we intend to try weighting the methods on a topic
by topic basis rather than by overall averages. Again this would be a retrospective upper
bound experiment. In terms of the CEO approach (Thompson 1990a,b) using only overall
averages would be analogous to using only feedback from past searches, while using topic-
specific weights would correspond to receiving feedback over several iterations of the same
search. We propose to investigate the second problem by analyzing the overlap of pairs of
runs of the various methods to determine dependence and thus perform CEO without the
independence assumption.
The PRC portion of our experiments were all run on a Sun SPARCstation 2 with 16
megabytes of RAM. The CEO code was written in g++.
Top ranked evaluation
338