SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Bayesian Inference with Node Aggregation for Information Retrieval
chapter
B. Del Favero
R. Fung
National Institute of Standards and Technology
D. K. Harman
Topic Topic Description
Number
57 Financial Health of MCI
61 Israeli Role in the Iran-Contra Affair
74 Conflicting Policies of the US Government
85 Official Corruption in any government
88 Crude Oil Price Trends
89 Downstream investing by OPEC members
90 Data on Proven Reserves of Oil and Gas
97 Fiber Optic Applications
98 Fiber Optic Equipment Manufacturers
99 Iran-Contra Affair
Table 3.1: List of the 10 topics we selected
The ten topics were selected to provide an interesting
mixture of relationships among the topics. Before looking
at the relevance information in the training set, we assessed
these relationships manually, as shown in Table 3.2. The
table is symmetrical about the main diagonal.
57 61 74 85 88 89 90 97 98
e i i [OCRerr] d
e dI[OCRerr][OCRerr][OCRerr][OCRerr][OCRerr] sd
i d e d i i i i [OCRerr]sd
i 5 d e i i i i sd
i i e d d i
- i[OCRerr]i[OCRerr]d[OCRerr]e[OCRerr]d -
i
[OCRerr] d e [OCRerr] [OCRerr]
d
i i e d
d i i d e
sd sd sd i i e
d = dependent
5 = subset
sd = subset or dependent
i = independent
e = equivalent
empty = mutually exclusive
57
61
74
85
88
89
90
97
98
99
57 61 74 85 88 89 90 97 98 99
e [OCRerr] [OCRerr]
57
61 [OCRerr] e d 5 * * d
74 [OCRerr]d[OCRerr]d e d * d
85 5 d e [OCRerr] * * d* [OCRerr]d
88 [OCRerr] [OCRerr] e i i [OCRerr] * *
89 i e i [OCRerr] *
90 * [OCRerr] i i e [OCRerr] [OCRerr]
97 [OCRerr]d [OCRerr]* [OCRerr]* [OCRerr]* [OCRerr] [OCRerr] [OCRerr] e d *
98 [OCRerr]
99 * d [OCRerr]d d [OCRerr]* [OCRerr]* [OCRerr] [OCRerr] [OCRerr] e
* = manual intervention required
Table 3.3: Pairwise relationships determined from the data
The system's assessments match the manual assessments
quite well. The system found even more mutually
exclusive pairs than we had intuitively thought. One
surprise is between topics 61 ("Israeli role in the Iran-
99 Contra Affair") and topic 91 ("Iran Contra Affair"). One
might assume that 61 is a subset 99, but there are indeed
documents in the training set that are relevant to 61 but not
to 99. Thus, the relation between 61 and 99 is dependence
rather subset.
The information in Table 3.3 can be expressed as a directed
graph, as shown in Figure 3.1. Each topic is a node. Two
topics are connected by an arc if there is at least one
document containing them both. There is no arc between
mutually exclusive topics. The arcs marked "i" connect
independent topics, the unmarked arcs connect topics that
are dependent, and the directed arc points to a subset from
its superset.
Figure 3.1 is not a belief network. It may be considered a
"co-occurrence diagram," since topics that are relevant
together (that co-occur) in the collection are connected.
97
Table 3.2: Pairwise relationships between 10 TREC topics,
assessed manually before looking at the training data
Table 3.3 shows the topic relationships that were generated
automatically from the relevance judgements on the
training documents. Manual verification of the system's
assessment was required in about half of the cases (marked
with an asterisk). The table is symmetrical about the main
diagonal.
155
57 98
½
74 61
I>½f
99 85
88 89
90
Figure 3.1: Relationships diagram among 10 TREC topics