SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Bayesian Inference with Node Aggregation for Information Retrieval
chapter
B. Del Favero
R. Fung
National Institute of Standards and Technology
D. K. Harman
networks is similar to expert system technologies. Given a
Bayesian network (i.e., a knowledge base) for a class of
situations and evidence (i.e., facts) about a particular
situation of that class, conclusions can be drawn about that
situation. The technology has been used in a wide variety
of situations, including medical diagnosis, military situation
assessment, and machine vision.
A Bayesian network is a directed acyclic graph where each
node represents a random variable (i.e., a set of mutually
exclusive and collectively exhaustive propositions). Each
set of arcs into a node represents a probabilistic dependence
between the node and its predecessors (the nodes at the
other end of arcs). The primary technical innovation of
Bayesian networks is the representation, through the
network's structure, of conditional independence relations
between the variables in a network. This innovation not
only provides an intuitive representation to acquire
probabilistic information but also renders inference
tractable for large numbers of real-world situations.
Inferences can be drawn from a Bayesian network with a
wide variety of algorithms. There are exact algorithms
(Lauritzen & Spiegelhalter, 1988; Shachter, 1986; Shachter,
D'Ambrosio, & Del Favero, 1990) as well as approximate
algorithms (Fung & Chang, 1989) Inference algorithms
compute a probability distribution of some combination of
variables in the network given all the evidence represented
in the network.
Since the introduction of the Bayesian network technology,
several efforts have been made to apply it to information
retrieval (Fung, Crawford, Appelbaum, & Tong, 1990;
Turtle & Croft, 1990). The results are promising.
However, several recent innovations have been made in the
Bayesian network technology that have not yet been
applied to information retrieval. The innovations involve
the representation of conditional independence relations
that are finer than those currently represented at the
network level. One of these innovations is Similarity
Networks (Heckerman, 1991) developed by David
Reckerman at Stanford University. Another innovation for
representing the relationship between variables, the "co-
occurrence diagram," was developed on this project.
2.3 Probabilistic Information Retrieval
Most probabilistic information retrieval techniques can be
illustrated graphically through the use of Bayesian
networks. In a simple formulation, there are n topics of
interest (ti....tn) and m identifiable document features
(f 1....[OCRerr]m) The information retrieval problem is to
compute the posterior probability p(ij If 1....[OCRerr]m) for each
topic, given a quantification (i.e., the joint probability p(tl,
tn)) of the frequency that the topics appear in the corpus
and a quantification (i.e., the conditional probability
p(f 1....[OCRerr]m ti.....tn)) of the relationship of the
"presence" of a topic in a document and the "presence" of
features.
152
Figure 2.1 is a Bayesian network representing a retrieval
model with one topic of interest and three features. The
node t represents the events "the document is relevant to
topic t" and "the document is not relevant to topic t." The
nodes Ii represent the events "the feature Ii is present in the
document" and "the feature [OCRerr] is not present (is absent) in
the document." The prior probabilities of the events t and
not-t are stored in the t node. The conditional probabilities
p( Ii It) are stored in each of the feature nodes.
Figure 2.1: The two-level Bayesian network model of
information retrieval
Because of the lack of arcs between the feature nodes, this
diagram embodies the assumption, used in many
probabilistic systems, that the features are conditionally
independent of each other given the topic.
Using the Bayesian network form of Bayes' rule called arc
reversal (Shachter, 1986), the posterior probability of the
event t can be computed. The inversion formulas are
straightforward and computationally feasible (they are
described in the Appendix). Figure 2.2 shows the network
in Figure 2.1 with the arcs reversed. It represents a model
in which knowing whether one or more of the features are
present provides information (i.e., the posterior distribution
on node t) about whether the topic is relevant to the
document.
Figure 2.2: The Information retrieval model after
Bayesian inversion
To address multiple topics within this framework, a model
similar to the one represented by Figure 2.1 would be
generated for each topic, and inference would be carried out
on each topic separately. However, these multiple models