SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Combining Evidence for Information Retrieval
chapter
N. Belkin
P. Kantor
C. Cool
R. Quatrain
National Institute of Standards and Technology
D. K. Harman
Combining Evidence for Information Retrieval
N.J. Belkin, P. Kantor, C. Cool, R. Quatrain
School of Communication, Information & Library Studies
Rutgers University
New Brunswick, NJ 08903 USA
[belkin"kantorp/ccoollquatrain]@cs.rutgers.edu
Abstract
This study investigated the effect on retrieval performance
of two methods of combination of multiple representations
of TREC topics. Five separate Boolean queries for each of
the 50 TREC routing topics and 25 of the TREC ad hoc
topics were generated by 75 experienced online searchers.
Using the INQUERY retrieval system, these queries were
both combined into single queries, and used to produce five
separate retrieval results, for each topic. In the former case,
results indicate that progressive combination of queries
leads to progressively improving retrieval performance, sig-
nificantly better than that of single queries, and at least as
good as the best individual single query formulations. In
the latter case, data fusion of the ranked lists also led to per-
formance better than that of any single list.
1. Introduction
The general goal of our project in the TREC-2 program
was to investigate the effect of making use of several differ-
ent formulations of a single information problem, on in-
formation retrieval (IR) system performance. The basis for
this work lies in both theory and empirical evidence. From
the empirical point of view, it has been noted for some
time, that different representations of the same information
problem retrieve sets (or ranked lists) of documents which
contain different relevant, as well as non-relevant documents
(see, e.g. McGill, Koll & Norreault, 1979; Saracevic &
Kantor, 1988). There is some implication from this evi-
dence (made explicit by Saracevic and Kantor, 1988), that
taking account of the different results of the different formu-
lations, could lead to retrieval performance that is better
than that of any of the individual query formulations. From
the theoretical point of view, IR can be considered as a
problem of inference (see, e.g. van Rijsbergen, 1986). That
is, IR is concerned with estimating, given available evi-
dence about such things as information problems and doc-
uments (or in general, retrievable information objects), the
likelihood (or probability, or degree) of relevance of a doc-
ument to the information problem. From this point of
view, different query formulations constitute different
sources of evidence which could be used to infer the proba-
ble relevance of a document to an information problem, and
it is thus reasonable to consider ways in which to use (i.e.
combine) these sources of evidence in the inference process.
These ideas are general to any source of evidence which
might be used for IR, such as the evidence of different re-
trieval techniques, or different document representation
techniques, or, in general, different IR systems. One aspect
35
of our project uses the example of different query formula-
tions as a simulation of the general problem of combina-
tion of evidence from different systems.
An additional argument is available for the special case
of different query representations. That is, if we consider an
information problem to be a complex, and in general diffi-
cult-to-specify entity (see, e.g. Taylor, 1968; Belkin, Oddy
& Brooks, 1982), then we might conclude that each differ-
ent representation, derived from some statement by the user,
is a different interpretation of the user's underlying informa-
tion problem, highly unlikely to be like anyone else's (or
any other system's) interpretation. Given the empirical evi-
dence, whether any one such interpretation is `better' than
another seems mooL However, we might say that each cap-
tures some different, yet pertinent aspect of the user's under-
lying problem; or, that those aspects of the different inter-
pretations which are common to them all (or more than
one) reflect some `core' aspect of the problem. Although
techniques for making use of the different interpretations
might vary according to which of these two views one
takes, the general position suggests that it will always be a
good idea to take advantage of as many such interpretations
as possible. For this case, we therefore consider the issue
of combination of different query representations within the
`same' IR system.
Our project, thus, considers the problem of inference in
IR at two levels of analysis. The first level, as introduced
by Turtle & Croft (1991), asks about the effect of evidence
obtained when two or more formal query statements are
produced for the same information problem. The second
level, which is simulated in this study, asks about combi-
nation of evidence provided by two or more distinct sys-
tems, ranking the same set of documents in response to the
same problem. To distinguish th[OCRerr]se two levels, and in
keeping with earlier discussions of the issues involved, we
henceforth refer to the combination of query statements as
"query combination", and we refer to the combination of ev-
idence from differing systems as "data fusion". Others have
also addressed various aspects of this general question.
Apart from those akeady cited, we mention in particular the
work of Fox and his colleagues (Fox et al., 1993; Fox and
Shaw, this volume), and that of Belkin, et al. (1993).
These studies in fact address precisely the question of query
combination, the Belkin et al. work being a direct precursor
to this, and the Fox et al. studies using different query for-
mulation, combination and retrieval techniques, but with
very similar results.
Why ought either of these two methods work in the IR
situation? The central idea is that either the specific inter-
nal score, assigned to a document for a query, or the rank of