NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)

SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Bayesian Inference with Node Aggregation for Information Retrieval chapter B. Del Favero R. Fung National Institute of Standards and Technology D. K. Harman 3.2.2 Generating states from the topics The states of the multiple-topic query can be generated automatically by enumerating all possible "relevant or not relevant" combinations of topics, subject to two rules: 1. Two mutually exclusive topics cannot both be relevant within the same state. 2. A topic that is a subset of another cannot be relevant within a state unless its superset topic is also relevant. By construction, there is always one state, U-, to which a document is relevant if it is relevant to none of the topics. For the ten topics, these rules reduce the number of states drastically from the theoretical maximum of 1024 = 210 states to the actual number of 28 states. A state is identified by listing the topics to which a document is relevant (or not relevant) if it is relevant to that state. For instance, documents relevant to the state (61-74 85-99) are relevant to topics 61 and 85 and are not relevant to topics 74 and 99. The list of states appears in the first column of Table 3.4. 3.2.3 Generating State Priors In theory, the prior probabilities of the states can be calculated from their relative frequencies in the training set. However, since there are very few documents in the TREC- 2 training set that were evaluated for three or more topics, a different way to estimate the priors was required. One estimate is provided by factoring the states prior into products of the priors of smaller compound topics. This can be accomplished without manual intervention. For instance, for the independent topics 88, 89, and 90, only three numbers are needed (the prior probabilities of the three topics in the training set) to compute the probabilities of all seven states containing the three topics. For the state (61-74 85-99), two numbers are needed: the prior probability of the compound topic (-74 85-99) and the probability that topic 61 is relevant given that topic 85 is relevant. The priors obtained by this method are shown in the second column of Table 3.4. They are expressed as inverse frequencies: the number given is the number of weeks (on average) between articles relevant to a state, assuming 1000 documents per week. State Average Average Weeks between Weeks between Articles, Articles, Assessed Assessed Automatically Manually -88 -89 90 < 1 3 -88 89 -90 < 1 3 -88 89 90 20 7 88 -89 -90 < 1 2 88 -89 90 20 4 88 89 -90 40 4 88 89 90 4000 9 57 97 98 10 8 57 97 -98 < 1 5 57 -97 98 < 1 6 57 -97 -98 < 1 3 -57 97 98 1 6 -57 97 -98 < 1 4 -57 -97 98 < 1 3 61 74 85 99 33 20 -61 74 85 99 14 3 61 74 85 -99 < 1 50 -61 74 85 -99 < 1 3 61 -74 85 99 < 1 20 -61 -74 85 99 < 1 5 61 -74 85 -99 5 100 -61 -74 85 -99 2 2 74 -85 99 < 1 3 74 -85 -99 < 1 2 -74 -85 99 < 1 3 57 74 < 1 20 85 98 50 U _______________ Table 3.4: The states and their prior probabilities expressed as frequencies (assuming 1000 articles per week) generated by two assessment methods However, these priors proved unsatisfactory and required manual override, for three reasons. First, the training set is a very biased sample of the WSJ. The training documents were selected by the retrieval systems of TREC-1 to be intentionally relevant to at least one of the topics. Thus, the prior derived from the training set tends to be a gross overestimate of the true prior. Second, the time period of the training set (1987 to 1992) is different from that of the test set (only 1991). The frequency of the topics relative to each other changes over time. For instance, there are many fewer articles on the Iran-Contra affair (relative to the other topics) in 1991 than there were in 1987-1990. We adjusted the priors to match the relative frequencies expected in the test set. 156