SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Description of the PRC CEO Algorithm for TREC
chapter
P. Thompson
National Institute of Standards and Technology
Donna K. Harman
Description of the PRC CEO Algorithm for TREC
Paul Thompson
PRC Inc., Mail Stop 5S3
1500 Planning Research Drive
McLean, VA 22102
Phone: 703/556-1923
Email: thompson[OCRerr]aul@po.gis.prc.com
This paper describes work done on the ThEC project at PRC Inc. in collaboration with
Professor Edward Fox and his colleagues at Virginia Polytechnic Institute and State
University (VPI&SU). The reader should refer to the description of their system included
in this proceedings for further details on the common processing of the TREC data shared
by PRC and VPI&SU (Fox et al. 1993). PRC developed an algorithm, the Combination
of Expert Opinion (CEO), which combined the results of VPI&SU's runs. VPI&SU used
a different combination technique for their final results. Originally the intent was that the
CEO algorithm would be integrated with the SMART system used by VPI&SU. Both
upper and lower level combination of results would take place, i.e., at the lower level of
individual document features within a particular retrieval method and the upper level of
combination of the output of the individual methods themselves, i.e., the various cosine
and p-norm methods used by VPI&SU. Furthermore we had originally hoped to train the
CEO algorithm, so that the weighting of the various methods would be optimized based on
relevance judgments. For the official TREC results we were only able to use the upper
level of CEO without any training. Since then we have done additional retrospective
experiments in which the different methods are weighted in the CEO algorithm by one of
several measures of their performance for TREC.
Combination of Expert Opinion
The statistical technique of CEO provides a solution to the problem of combining different
probabilistic models of document retrieval. This technique is expected to result in
improved precision and recall over that provided by any one model, or method, since
research has shown that various retrieval models retrieve different sets of relevant
documents (Katzer et al. 1982, Fox et al. 1988). In the Bayesian formulation of the CEO
problem (Lindley 1983) a decision maker is interested in some parameter or event; and
he/she has a prior, or initial, distribution or probability for that parameter or event. The
decision maker revises the distribution upon consulting several experts, each with his/her
own distribution or probability for the parameter or event. To effect this revision, the
decision maker must assess the relative expertise of the experts and their interdependence,
both with each other and the decision maker. The experts' distributions are considered as
data by the decision maker, which is used to update the prior distribution.
For automatic document retrieval, the retrieval system is the decision maker and different
retrieval algorithms, or models, are the experts (Thompson 1990a,b, 1991). This is
referred to as the upper level CEO. At the lower level the probabilities of individual
features, e.g., terms, within a particular retrieval model can be combined using CEO. In
lower level CEO the retrieval model is the decision maker and the term probabilities are
viewed as lower level experts. The probability distributions supplied by these lower level
experts can be updated, according to Bayes theorem, by user relevance judgments for
retrieved documents. These same relevance judgments also give the system a way to
evaluate the performance of each model, both in the context of a single search of several
iterations and over all searches to date. These results can be used in a statistically sound
337