SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Bayesian Inference with Node Aggregation for Information Retrieval
chapter
B. Del Favero
R. Fung
National Institute of Standards and Technology
D. K. Harman
precision
1.00 Topic Average Precision
0.90 Number idsra2 Best Median Worst
0.90
57 0.387 0.460 0.374 0.000
0.70
61 0.464 0.464 0.083 0.000
0.60
74 0.008 0.074 0.008 0.000
0.50
85 0.174 0.353 0.174 0.000
0.40
89 0.081 0.259 0.077 0.000
0.30
-[OCRerr] -- 90 0.025 0.025 0.000 0.000
0.20
97 0.383 0.383 0.202 0.002
0.10 90 85
74 98 0.282 0.427 0.334 0.000
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.9 0.9 1.0 99 0.700 0.700 0.509 0.000
Recall
Figure 5.1: Precision vs. Recall for system idsr[OCRerr]
Topic Relevant Retrieved at 100
Number idsra2 Best Median Worst
57 17 18 17 0
61 19 19 9 0
74 6 16 6 0
85 33 54 33 1
89 2 3 2 0
90 1 1 0 0
97 25 28 18 1
98 24 24 17 0
99 60 60 52 0
Table 5. la: Relevant documents in the top 100 retrieved,
for idsra2 and for all systems
Topic Relevant Retrieved at 1000
Number idsra2 Best Median Worst
57 18 19 18 0
61 24 25 24 0
74 11 31 11 1
85 88 115 88 2
89 2 4 2 0
90 1 3 0 0
97 27 32 27 1
98 26 29 26 0
99 70 70 66 0
Table 5. lb: Relevant documents in the top
for idsra2 and for all systems
1000 retrieved,
159
Table 5.lc: Average precision (as defined for TREC-2),
for idsra2 and for all systems
6 Conclusions and Future Directions
We believe that we have made significant progress to
developing an information retrieval architecture that:
is oriented towards assisting users with stable
informafion needs in routing large amounts of time-
sensifive material
* gives users an intuitive language with which to specify
their information needs
requires modest computational resources, and
* can integrate relevance feedback and training data with
users' judgements to incrementally improve retrieval
performance.
We are encouraged by the test results. We have not had
very much time to analyze the results but we intend to try to
understand why we did very well on some topics and not so
well on others. Very preliminary analysis suggests that the
features for the topics in which we did well (e.g., 61 and
99) were much more informafive than the ones on which
we did very poorly (e.g., 74).
We have many ideas for future research. These ideas fall
into three basic categories: probabilistic representation,
user interface, and inference methods.
The most important improvements we would like to make
are in the category of probabilistic representation of the
topic and the document. One research goal is to develop a
way to intuitively represent relationships between features.
Also, we would like to explore more sophisticated feature
extractors that recognize phrases, synonyms, and features
derived from natural language processing. We believe that
achievement of these goals could lead to significant
improvements in performance.