SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Description of the PRC CEO Algorithm for TREC chapter P. Thompson National Institute of Standards and Technology Donna K. Harman Description of the PRC CEO Algorithm for TREC Paul Thompson PRC Inc., Mail Stop 5S3 1500 Planning Research Drive McLean, VA 22102 Phone: 703/556-1923 Email: thompson[OCRerr]aul@po.gis.prc.com This paper describes work done on the ThEC project at PRC Inc. in collaboration with Professor Edward Fox and his colleagues at Virginia Polytechnic Institute and State University (VPI&SU). The reader should refer to the description of their system included in this proceedings for further details on the common processing of the TREC data shared by PRC and VPI&SU (Fox et al. 1993). PRC developed an algorithm, the Combination of Expert Opinion (CEO), which combined the results of VPI&SU's runs. VPI&SU used a different combination technique for their final results. Originally the intent was that the CEO algorithm would be integrated with the SMART system used by VPI&SU. Both upper and lower level combination of results would take place, i.e., at the lower level of individual document features within a particular retrieval method and the upper level of combination of the output of the individual methods themselves, i.e., the various cosine and p-norm methods used by VPI&SU. Furthermore we had originally hoped to train the CEO algorithm, so that the weighting of the various methods would be optimized based on relevance judgments. For the official TREC results we were only able to use the upper level of CEO without any training. Since then we have done additional retrospective experiments in which the different methods are weighted in the CEO algorithm by one of several measures of their performance for TREC. Combination of Expert Opinion The statistical technique of CEO provides a solution to the problem of combining different probabilistic models of document retrieval. This technique is expected to result in improved precision and recall over that provided by any one model, or method, since research has shown that various retrieval models retrieve different sets of relevant documents (Katzer et al. 1982, Fox et al. 1988). In the Bayesian formulation of the CEO problem (Lindley 1983) a decision maker is interested in some parameter or event; and he/she has a prior, or initial, distribution or probability for that parameter or event. The decision maker revises the distribution upon consulting several experts, each with his/her own distribution or probability for the parameter or event. To effect this revision, the decision maker must assess the relative expertise of the experts and their interdependence, both with each other and the decision maker. The experts' distributions are considered as data by the decision maker, which is used to update the prior distribution. For automatic document retrieval, the retrieval system is the decision maker and different retrieval algorithms, or models, are the experts (Thompson 1990a,b, 1991). This is referred to as the upper level CEO. At the lower level the probabilities of individual features, e.g., terms, within a particular retrieval model can be combined using CEO. In lower level CEO the retrieval model is the decision maker and the term probabilities are viewed as lower level experts. The probability distributions supplied by these lower level experts can be updated, according to Bayes theorem, by user relevance judgments for retrieved documents. These same relevance judgments also give the system a way to evaluate the performance of each model, both in the context of a single search of several iterations and over all searches to date. These results can be used in a statistically sound 337