SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Description of the PRC CEO Algorithm for TREC-2
chapter
P. Thompson
National Institute of Standards and Technology
D. K. Harman
Description of the PRC CEO Algorithm for TREC-2
Paul Thompson
PRC Inc., Mail Stop 5S3
1500 PRC Drive
McI-can, VA 22102
Phone: 612/687-4650
E-mail: thompson@research.westlaw.com
(current affiliation: West Publishing Company)
Abstract
This paper describes an application of the
Combination of Expert Opinion technique to
combine the results of multiple retrieval
methods used on the TREC-2 collection.
The methods being combined were weighted
by their TREC-1 performance.
1. Introduction
This paper describes work done on the
TREC-2 project at PRC Inc. in collaboration
with Professor Edward Fox and his
colleagues at Virginia Polytechnic Institute
and State University (VPI&SU). The reader
should refer to the description of their
system included in these working notes for
further details on the common processing of
the TREC-2 data shared by PRC and
VPI&SU (Fox et al. 1993). PRC used its
algorithm, the Combination of Expert
Opinion (CEO), to combine the results of
VPI&SU's runs. VPI&SU used a different
combination technique for their final results.
Originally the intent was that the CEO
algorithm would be integrated with the
SMART system used by VPI&SU. Both
upper and lower level combination of results
would take place, i.e., at the lower level of
individual document features within a
particular retrieval method and the upper
271
level of combination of the output of the
individual methods themselves, i.e., the
various cosine and p-norm methods used by
VPI&SU. For TREC-1 we were not able to
train the CEO algorithm, so that the
weighting of the various methods would be
optimized based on relevance judgments.
For TREC-2 we used the 11 point average
scores obtained by the various methods for
TREC- 1 for weighting. Again we only used
the upper level of CEO. For TREC-1 we
found that combining all methods resulted in
lower performance than using the single best
method. This year our first version
combined the top two methods, based on
TREC- 1, while the second version used the
top five methods.
2. Combination of Expert Opinion
The statistical technique of CEO provides a
solution to the problem of combining
different probabilistic models of document
retrieval. This technique is expected to
result in improved precision and recall over
that provided by any one model, or method,
since research has shown that various
retrieval models retrieve different sets of
more or less equally relevant documents
(Katzer et al. 1982, Fox et al. 1988). In the