SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Bayesian Inference with Node Aggregation for Information Retrieval chapter B. Del Favero R. Fung National Institute of Standards and Technology D. K. Harman precision 1.00 Topic Average Precision 0.90 Number idsra2 Best Median Worst 0.90 57 0.387 0.460 0.374 0.000 0.70 61 0.464 0.464 0.083 0.000 0.60 74 0.008 0.074 0.008 0.000 0.50 85 0.174 0.353 0.174 0.000 0.40 89 0.081 0.259 0.077 0.000 0.30 -[OCRerr] -- 90 0.025 0.025 0.000 0.000 0.20 97 0.383 0.383 0.202 0.002 0.10 90 85 74 98 0.282 0.427 0.334 0.000 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.9 0.9 1.0 99 0.700 0.700 0.509 0.000 Recall Figure 5.1: Precision vs. Recall for system idsr[OCRerr] Topic Relevant Retrieved at 100 Number idsra2 Best Median Worst 57 17 18 17 0 61 19 19 9 0 74 6 16 6 0 85 33 54 33 1 89 2 3 2 0 90 1 1 0 0 97 25 28 18 1 98 24 24 17 0 99 60 60 52 0 Table 5. la: Relevant documents in the top 100 retrieved, for idsra2 and for all systems Topic Relevant Retrieved at 1000 Number idsra2 Best Median Worst 57 18 19 18 0 61 24 25 24 0 74 11 31 11 1 85 88 115 88 2 89 2 4 2 0 90 1 3 0 0 97 27 32 27 1 98 26 29 26 0 99 70 70 66 0 Table 5. lb: Relevant documents in the top for idsra2 and for all systems 1000 retrieved, 159 Table 5.lc: Average precision (as defined for TREC-2), for idsra2 and for all systems 6 Conclusions and Future Directions We believe that we have made significant progress to developing an information retrieval architecture that: is oriented towards assisting users with stable informafion needs in routing large amounts of time- sensifive material * gives users an intuitive language with which to specify their information needs requires modest computational resources, and * can integrate relevance feedback and training data with users' judgements to incrementally improve retrieval performance. We are encouraged by the test results. We have not had very much time to analyze the results but we intend to try to understand why we did very well on some topics and not so well on others. Very preliminary analysis suggests that the features for the topics in which we did well (e.g., 61 and 99) were much more informafive than the ones on which we did very poorly (e.g., 74). We have many ideas for future research. These ideas fall into three basic categories: probabilistic representation, user interface, and inference methods. The most important improvements we would like to make are in the category of probabilistic representation of the topic and the document. One research goal is to develop a way to intuitively represent relationships between features. Also, we would like to explore more sophisticated feature extractors that recognize phrases, synonyms, and features derived from natural language processing. We believe that achievement of these goals could lead to significant improvements in performance.