SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Bayesian Inference with Node Aggregation for Information Retrieval chapter B. Del Favero R. Fung National Institute of Standards and Technology D. K. Harman Topic Topic Description Number 57 Financial Health of MCI 61 Israeli Role in the Iran-Contra Affair 74 Conflicting Policies of the US Government 85 Official Corruption in any government 88 Crude Oil Price Trends 89 Downstream investing by OPEC members 90 Data on Proven Reserves of Oil and Gas 97 Fiber Optic Applications 98 Fiber Optic Equipment Manufacturers 99 Iran-Contra Affair Table 3.1: List of the 10 topics we selected The ten topics were selected to provide an interesting mixture of relationships among the topics. Before looking at the relevance information in the training set, we assessed these relationships manually, as shown in Table 3.2. The table is symmetrical about the main diagonal. 57 61 74 85 88 89 90 97 98 e i i [OCRerr] d e dI[OCRerr][OCRerr][OCRerr][OCRerr][OCRerr] sd i d e d i i i i [OCRerr]sd i 5 d e i i i i sd i i e d d i - i[OCRerr]i[OCRerr]d[OCRerr]e[OCRerr]d - i [OCRerr] d e [OCRerr] [OCRerr] d i i e d d i i d e sd sd sd i i e d = dependent 5 = subset sd = subset or dependent i = independent e = equivalent empty = mutually exclusive 57 61 74 85 88 89 90 97 98 99 57 61 74 85 88 89 90 97 98 99 e [OCRerr] [OCRerr] 57 61 [OCRerr] e d 5 * * d 74 [OCRerr]d[OCRerr]d e d * d 85 5 d e [OCRerr] * * d* [OCRerr]d 88 [OCRerr] [OCRerr] e i i [OCRerr] * * 89 i e i [OCRerr] * 90 * [OCRerr] i i e [OCRerr] [OCRerr] 97 [OCRerr]d [OCRerr]* [OCRerr]* [OCRerr]* [OCRerr] [OCRerr] [OCRerr] e d * 98 [OCRerr] 99 * d [OCRerr]d d [OCRerr]* [OCRerr]* [OCRerr] [OCRerr] [OCRerr] e * = manual intervention required Table 3.3: Pairwise relationships determined from the data The system's assessments match the manual assessments quite well. The system found even more mutually exclusive pairs than we had intuitively thought. One surprise is between topics 61 ("Israeli role in the Iran- 99 Contra Affair") and topic 91 ("Iran Contra Affair"). One might assume that 61 is a subset 99, but there are indeed documents in the training set that are relevant to 61 but not to 99. Thus, the relation between 61 and 99 is dependence rather subset. The information in Table 3.3 can be expressed as a directed graph, as shown in Figure 3.1. Each topic is a node. Two topics are connected by an arc if there is at least one document containing them both. There is no arc between mutually exclusive topics. The arcs marked "i" connect independent topics, the unmarked arcs connect topics that are dependent, and the directed arc points to a subset from its superset. Figure 3.1 is not a belief network. It may be considered a "co-occurrence diagram," since topics that are relevant together (that co-occur) in the collection are connected. 97 Table 3.2: Pairwise relationships between 10 TREC topics, assessed manually before looking at the training data Table 3.3 shows the topic relationships that were generated automatically from the relevance judgements on the training documents. Manual verification of the system's assessment was required in about half of the cases (marked with an asterisk). The table is symmetrical about the main diagonal. 155 57 98 ½ 74 61 I>½f 99 85 88 89 90 Figure 3.1: Relationships diagram among 10 TREC topics