SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Bayesian Inference with Node Aggregation for Information Retrieval
chapter
B. Del Favero
R. Fung
National Institute of Standards and Technology
D. K. Harman
3.2.2 Generating states from the topics
The states of the multiple-topic query can be generated
automatically by enumerating all possible "relevant or not
relevant" combinations of topics, subject to two rules:
1. Two mutually exclusive topics cannot both be
relevant within the same state.
2. A topic that is a subset of another cannot be
relevant within a state unless its superset topic
is also relevant.
By construction, there is always one state, U-, to which a
document is relevant if it is relevant to none of the topics.
For the ten topics, these rules reduce the number of states
drastically from the theoretical maximum of 1024 = 210
states to the actual number of 28 states.
A state is identified by listing the topics to which a
document is relevant (or not relevant) if it is relevant to that
state. For instance, documents relevant to the state
(61-74 85-99) are relevant to topics 61 and 85 and are not
relevant to topics 74 and 99. The list of states appears in
the first column of Table 3.4.
3.2.3 Generating State Priors
In theory, the prior probabilities of the states can be
calculated from their relative frequencies in the training set.
However, since there are very few documents in the TREC-
2 training set that were evaluated for three or more topics, a
different way to estimate the priors was required.
One estimate is provided by factoring the states prior into
products of the priors of smaller compound topics. This
can be accomplished without manual intervention. For
instance, for the independent topics 88, 89, and 90, only
three numbers are needed (the prior probabilities of the
three topics in the training set) to compute the probabilities
of all seven states containing the three topics. For the state
(61-74 85-99), two numbers are needed: the prior
probability of the compound topic (-74 85-99) and the
probability that topic 61 is relevant given that topic 85 is
relevant. The priors obtained by this method are shown in
the second column of Table 3.4. They are expressed as
inverse frequencies: the number given is the number of
weeks (on average) between articles relevant to a state,
assuming 1000 documents per week.
State Average Average
Weeks between Weeks between
Articles, Articles,
Assessed Assessed
Automatically Manually
-88 -89 90 < 1 3
-88 89 -90 < 1 3
-88 89 90 20 7
88 -89 -90 < 1 2
88 -89 90 20 4
88 89 -90 40 4
88 89 90 4000 9
57 97 98 10 8
57 97 -98 < 1 5
57 -97 98 < 1 6
57 -97 -98 < 1 3
-57 97 98 1 6
-57 97 -98 < 1 4
-57 -97 98 < 1 3
61 74 85 99 33 20
-61 74 85 99 14 3
61 74 85 -99 < 1 50
-61 74 85 -99 < 1 3
61 -74 85 99 < 1 20
-61 -74 85 99 < 1 5
61 -74 85 -99 5 100
-61 -74 85 -99 2 2
74 -85 99 < 1 3
74 -85 -99 < 1 2
-74 -85 99 < 1 3
57 74 < 1 20
85 98 50
U _______________
Table 3.4: The states and their prior probabilities
expressed as frequencies (assuming 1000 articles per
week) generated by two assessment methods
However, these priors proved unsatisfactory and required
manual override, for three reasons. First, the training set is
a very biased sample of the WSJ. The training documents
were selected by the retrieval systems of TREC-1 to be
intentionally relevant to at least one of the topics. Thus, the
prior derived from the training set tends to be a gross
overestimate of the true prior.
Second, the time period of the training set (1987 to 1992) is
different from that of the test set (only 1991). The
frequency of the topics relative to each other changes over
time. For instance, there are many fewer articles on the
Iran-Contra affair (relative to the other topics) in 1991 than
there were in 1987-1990. We adjusted the priors to match
the relative frequencies expected in the test set.
156