NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)

SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Bayesian Inference with Node Aggregation for Information Retrieval chapter B. Del Favero R. Fung National Institute of Standards and Technology D. K. Harman Third, the priors are widely divergent from the intuition of any user who reads a newspaper. The numbers suggest, for instance, that about 1 out of every 100 articles in the WSJ (and, by analogy, in the SJMN) are relevant to topic 57, "MCI." Any reader of the WSJ or the SJMN knows that this estimate is much too high - the relative frequency is probably closer to 1 in 500 or 1 in 1000 than to 1 in 100. Thus, the prior probabilities were assessed manually. It is assumed that the user has some knowledge of the test domain (in this case, articles in the SJMN) and can with some thought assess the relative frequencies of various states as a part of specifying the query for the particular routing request. For each state, we ask the user what is the average number of weeks between the publication of articles relevant to that state. This number is presented in the third column of Table 3.4. It can be converted to a prior probability by combininbg it with the assumption that there are 1000 documets per week. The prior probability of U , p( U ), is calculated as one minus the sum of the priors of all the other states, to ensure that the probability of all states together is one. 3.2.4 Feature Conditional Probabilities The inference algorithm requires, for each feature and for each state, the conditional probability of the feature given the state. These probabilities cannot be obtained directly from the relative frequencies obtained from the training set, because there are few documents that are relevant to more than one topic at a time. We approximate these probabilities by using a structure called a noisy-or gate. The noisy-or gate combines the effects of two or more factors, each of which may contribute to the presence of a feature. It is a model of disjunctive interaction, as described in (Pearl, 1988). It has been used in medical decision research to calculate the probability of a particular symptom being present, given diseases that cause the symptom (Heckerman, 1989). In the context of information retrieval, a feature may be present due to any of the topics tha[OCRerr] are relevant in the state. For each state-feature pair, we build a noisy-or model. The contributing factors are the topics that are relevant within the state. The effect is the feature's presence or absence in the document. For example, consider a feature f and the state (57-97 98). The feature may be present due to topics 57 or 98. It cannot be present due to topic 97 because that topic is not relevant within the state. Let El be the event that the feature is present due to topic 57, and let E2 be the event that the feature is present due to topic 98. Table 3.5 lists all the possible cases of the two uncertain events. Figure 3.4 shows the belief network structure of the noisy-or model. 157 The node with the double wall is a deterministic logical or gate. Feature Feature Feature Present due to Present due to Present at all 57 98 (E1ORE2) (El) (E2) ____________ Yes Yes Yes Yes No Yes No Yes Yes No No No Table 3.5: Possible cases for a noisy-or node PyE1E2 0 Figure 3.4: Belief Network corresponding to a Noisy-Or Gate Model The only case in Table 3.5 in which the feature is absent is the fourth case. Thus, the conditional probability that the feature is absent, given this state, is the probability of that fourth case. The probability that the feature is present is one minus the probability that the feature is absent. 3.3 Document Scoring The Bayesian inversion described in Section 2.1 yields, for each document, the posterior probability that the document is relevant to each state. We calculate the posterior probability that the document is relevant to each topic by summing the posterior probabilities of all of the states in which the topic appears. For example, the posterior probability for topic 57 is the sum of the posterior probabilities of the five states in which topic 57 is relevant (refer to Table 3.4). The states are (57 97 98), (57 97-98), (57-97 98), (57-97-98-74) and (57 74). The final list of documents for each topic contains the top 1000 documents, ranked in descending order according to the posterior probability that they are relevant to that topic.