SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Combining Evidence for Information Retrieval
chapter
N. Belkin
P. Kantor
C. Cool
R. Quatrain
National Institute of Standards and Technology
D. K. Harman
a document in the list produced for a query, represents in-
formation about the relevance of the document to the query.
For Boolean retrieval, we may address this question with
concepts of signal detection. In this framework, there are
two conditional probabilities. The probability that a rele-
vant document is retrieved by system S is d5. The proba-
bility that a not relevant document is retrieved is f5. If two
Systems (or formulations) are independent, the posterior rel-
evance odds are increased by the product dld21flf2. In ac-
tual application (Saracevic and Kantor, 1988), improve-
ments are not this large, suggesting either the existence of
an effective base of not-relevant documents, or some effect
of interdependence. It can be shown that if several query
formulations are drawn from a normal distribution centered
at the optimal query formulation, then some fraction of the
time, the simple average of these formulations will be
closer to the optimum than even the best of them. An even
larger fraction of the time, there will be an optimum linear
combination which is more nearly optimal than any of the
cases from which it is formed [OCRerr]antor, 1993).
The existence of such models explains why we might
expect combination of evidence, or data fusion, to work for
the case of several query formulations, as, for instance, in
the INQUERY retrieval system (Turtle & Croft, 1991).
But these models do not predict that these techniques must
work. The investigation of whether they do work, is the
subject of this paper.
Specifically, we investigate whether data fusion meth-
ods will produce better performance than any single method;
and, whether combination of query formulations does better
than the best individual query formulations, and whether
progressive combination of query formulations leads to
progressively better IR performance. For each of these
questions, we also address the issue of what methods to use
in the combination of evidence.
In this paper, we do not discuss the "official" results
which we submitted to TREC-2, except in passing. The
reason for this is that we are not so much interested in the
absolute performance of the techniques which we use, as in
their performance relative to one another. what we are
most concerned with is what happens to retrieval perfor-
mance as we combine evidence; if we find that combining
evidence in specific ways leads to improvements over our
starting point of non-combination, then we can begin to
investigate how to optimize starting points, as well as rules
for combination.
The general plan of our study was as follows. We col-
lected, from experienced online searchers, five different
query formulations for each of the 50 routing topics and for
25 of the ad hoc topics. These query formulations were
then put to the INQUERY retrieval system (made available
to us by the University of Massachusetts), both as single
queries, and as combinations of queries for each topic. The
combinations were studied at various levels, with the five-
fold combination for each set being reported as "official"
TREC-2 results for query combination. The five retrieved
lists for the ad hoc topics were merged, and reported as
"official" TREC-2 results for data fusion.
2. Methods
36
2.1 Query Formulation Procedures
The query formulations used in this study were gener-
ated by volunteer online searchers, all of whom were expe-
rienced users of large bibliographic retrieval systems. In
order to obtain the multiple query representations, we asked
five different searchers to generate Boolean search state-
ments for each of the TREC topics in our analysis. We
asked each of our volunteer searchers to generate a query
formulation for five different topics, resulting in five inde-
pendenfly generated query formulations for each topic. Af-
ter formulating each query, searchers were asked to answer
four questions about the process: how long it took to for-
mulate the query; how related the topic was to their normal
searches; how easy it was for them to formulate the query;
and, the extent to which they had enough information to
construct the query. A total of 75 searchers participated in
our study; 50 for the routing topics, and 25 for the ad hoc
topics. In addition to the questionnaire items mentioned
above, the ad hoc searchers were also asked how many years
of online searching experience they had. Searchers for the
routing queries were not asked this question. See the Ap-
pendix for a sample response sheeL
Our study is based on analysis of the entire set of 50
routing topics, and a selected sample of 25 ad hoc topics.
The sample was stratified according to the domain of the
topic, in an effort to represent the distribution of domains
in the entire set of ad hoc topics.
In our experiments, we used the INQUERY retrieval en-
gine (version 1.5), developed at the University of Mas-
sachusetts (Turtle & Croft, 1991). INQUERY is a proba-
bilistic inference network-based system, which is based
upon the idea of combining multiple sources of evidence in
order to plausibly infer the relevance of a document to a
query. The underlying formalism is that of a Bayesian
probabilistic inference network [OCRerr]earl, 1988), which pro-
vides strict rules for how to combine sources of evidence.
Turtle and Croft (1991) give a detailed description of the
model and its implementation; a more general description
is available in Belkin and Croft (1992). Here, we note a
few characteristics of the system which are germane to the
project at hand.
First, INQUERY provides a natural means for combi-
nation of multiple query formulations, as a function of its
design. Second, it incorporates a large set of operators
which allow, in addition to sophisticated natural language
query formulations, complex Boolean formulations. The
Boolean operators in INQUERY are not strict, however,
which allows ranking of output, and also leads to signifi-
cantly better performance than strict Boolean retrieval
(Turtle and Croft, 1991). See the paper by Croft in this
volume for more detail on INQUERY.
2.2 Query Combination Experiments
Each of the Boolean query formulations produced by
our searchers was translated into INQUERY syntax. Two
methods of query combination were then used in our study,
each specific to the TREC-2 tasks of responding to ad hoc