SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Bayesian Inference with Node Aggregation for Information Retrieval chapter B. Del Favero R. Fung National Institute of Standards and Technology D. K. Harman networks is similar to expert system technologies. Given a Bayesian network (i.e., a knowledge base) for a class of situations and evidence (i.e., facts) about a particular situation of that class, conclusions can be drawn about that situation. The technology has been used in a wide variety of situations, including medical diagnosis, military situation assessment, and machine vision. A Bayesian network is a directed acyclic graph where each node represents a random variable (i.e., a set of mutually exclusive and collectively exhaustive propositions). Each set of arcs into a node represents a probabilistic dependence between the node and its predecessors (the nodes at the other end of arcs). The primary technical innovation of Bayesian networks is the representation, through the network's structure, of conditional independence relations between the variables in a network. This innovation not only provides an intuitive representation to acquire probabilistic information but also renders inference tractable for large numbers of real-world situations. Inferences can be drawn from a Bayesian network with a wide variety of algorithms. There are exact algorithms (Lauritzen & Spiegelhalter, 1988; Shachter, 1986; Shachter, D'Ambrosio, & Del Favero, 1990) as well as approximate algorithms (Fung & Chang, 1989) Inference algorithms compute a probability distribution of some combination of variables in the network given all the evidence represented in the network. Since the introduction of the Bayesian network technology, several efforts have been made to apply it to information retrieval (Fung, Crawford, Appelbaum, & Tong, 1990; Turtle & Croft, 1990). The results are promising. However, several recent innovations have been made in the Bayesian network technology that have not yet been applied to information retrieval. The innovations involve the representation of conditional independence relations that are finer than those currently represented at the network level. One of these innovations is Similarity Networks (Heckerman, 1991) developed by David Reckerman at Stanford University. Another innovation for representing the relationship between variables, the "co- occurrence diagram," was developed on this project. 2.3 Probabilistic Information Retrieval Most probabilistic information retrieval techniques can be illustrated graphically through the use of Bayesian networks. In a simple formulation, there are n topics of interest (ti....tn) and m identifiable document features (f 1....[OCRerr]m) The information retrieval problem is to compute the posterior probability p(ij If 1....[OCRerr]m) for each topic, given a quantification (i.e., the joint probability p(tl, tn)) of the frequency that the topics appear in the corpus and a quantification (i.e., the conditional probability p(f 1....[OCRerr]m ti.....tn)) of the relationship of the "presence" of a topic in a document and the "presence" of features. 152 Figure 2.1 is a Bayesian network representing a retrieval model with one topic of interest and three features. The node t represents the events "the document is relevant to topic t" and "the document is not relevant to topic t." The nodes Ii represent the events "the feature Ii is present in the document" and "the feature [OCRerr] is not present (is absent) in the document." The prior probabilities of the events t and not-t are stored in the t node. The conditional probabilities p( Ii It) are stored in each of the feature nodes. Figure 2.1: The two-level Bayesian network model of information retrieval Because of the lack of arcs between the feature nodes, this diagram embodies the assumption, used in many probabilistic systems, that the features are conditionally independent of each other given the topic. Using the Bayesian network form of Bayes' rule called arc reversal (Shachter, 1986), the posterior probability of the event t can be computed. The inversion formulas are straightforward and computationally feasible (they are described in the Appendix). Figure 2.2 shows the network in Figure 2.1 with the arcs reversed. It represents a model in which knowing whether one or more of the features are present provides information (i.e., the posterior distribution on node t) about whether the topic is relevant to the document. Figure 2.2: The Information retrieval model after Bayesian inversion To address multiple topics within this framework, a model similar to the one represented by Figure 2.1 would be generated for each topic, and inference would be carried out on each topic separately. However, these multiple models