SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Description of the PRC CEO Algorithm for TREC chapter P. Thompson National Institute of Standards and Technology Donna K. Harman way to weight the contributions of the models in the combined probability distribution used to rank the retrieved documents. Since various algorithms, such as p-norm, are expressed in terms of correlations rather than probability distributions, it was necessary to extend the CEO algorithm to handle correlations. So far this extension has been handled in a heuristic fashion. If a retrieval method, e.g., one of the cosine methods, returned a value between 0 and 1 as a retrieval status value; the logistic transformation of this weight was interpreted as an estimate of the mean of a logistically transformed beta distribution which was provided as evidence to the decision maker. Since there was no basis with which to assign a standard deviation to this distribution, as called for by the CEO methodology, an assumption was made that all standard deviations were .4045, a value corresponding to a standard deviation of .1 in terms of probabilities. All of the retrieval methods used by VPI&SU were combined with the CEO algorithm except for the Boolean. That is, we used weighted and unweighted cosine and inner product measures as well as p-norm measures of 1.0, 1.5, and 2.0. For measures, such as the inner product and some of the p-norm results that did not give a retrieval status value in the 0 to 1 range, the result was mapped to this interval by scaling the highest score of the method in question for a given topic to the highest score given by one of the cosine measures. Default scores half way between 0 and the lowest score achieved by a particular method were used for documents not retrieved in the top 200 in response to a given topic, since the actual score of these documents was unknown. The Boolean model was not included, because it was not a ranked retrieval method. In the future we plan to extend our normalization techniques to use the Boolean results as well. Figure 1 shows our summary official TREC results for topics 51-100 on the Wall Street Journal collection from the first CD-ROM. Since TREC, we have experimented with weighting the different methods combined based on their performance with the TREC data. In other words we have attempted to determine an upper bound for performance based on knowledge of each method's performance on the actual test data. We used four different weighting schemes: the 11-point average, precision at 0.00 recall, precision at 0.10 recall, and unweighted (i.e., our official TREC results). We also tried using the five best methods, rather than the seven used for our official results, i.e., we excluded Pnorml.5 and Pnorm2.0. None of the weights produced better results than the unweighted scheme. This was surprising. Figure 2 shows our summary results for CEO based on all seven methods using the 11-point average and additional relevance judgments made by VPI&SU. Figure 3 shows the same weighting scheme using only NIST relevance judgments. Two immediate explanations suggest themselves. First, using overall averages may not be too useful. Second, our simple implementation of CEO assumes independence among the methods. To examine the first problem we intend to try weighting the methods on a topic by topic basis rather than by overall averages. Again this would be a retrospective upper bound experiment. In terms of the CEO approach (Thompson 1990a,b) using only overall averages would be analogous to using only feedback from past searches, while using topic- specific weights would correspond to receiving feedback over several iterations of the same search. We propose to investigate the second problem by analyzing the overlap of pairs of runs of the various methods to determine dependence and thus perform CEO without the independence assumption. The PRC portion of our experiments were all run on a Sun SPARCstation 2 with 16 megabytes of RAM. The CEO code was written in g++. Top ranked evaluation 338