SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Machine Learning for Knowledge-Based Document Routing (A Report on the TREC-2 Experiment) chapter R. Tong L. Appelbaum National Institute of Standards and Technology D. K. Harman It is interesting to compare our results with Verity's scores for these two topics. To do this we re-scored Verity's TOPIC2 results on the AP corpus alone.12 For Topic 52, Verity's results were: Queryid (Num): 52 Total number of documents over all queries Retrieved: Relevant: 1000 345 Rel_ret: 317 Interpolated Recall - Precision Averages: at 0.00 1.0000 at 0.10 0.9833 at 0.20 0.9342 at 0.30 0.8607 at 0.40 0.8314 at 0.50 0.7425 at 0.60 0.7125 at 0.70 0.6704 at 0.80 0.6161 at 0.90 0.3952 at 1.00 0.0000 Average precision (non-interpolated) over all rel docs: 0.7159 Precision: At S docs: 1.0000 At 10 docs: 1.0000 At 15 docs: 1.0000 At 20 docs: 1.0000 At 30 docs: 1.0000 At 100 docs: 0.9000 At 200 docs: 0.7900 At 500 docs: 0.5820 At 1000 docs: 0.3170 R-Precision (precision after R (= num_rel for a query) docs retrieved) Exact: 0.6812 Here we see better recall (317 of the 345 relevant docu- ments retrieved) but with slightly lower precision. The TOPIC2 tree for this topic is much more complex than the one we developed, which explams the better recall. Notice however that both trees gave perfect precision for the first 30 documents. For Topic 54, Verity's TOPIC2 results were: Queryid (Num): 54 Total number of documents over all queries Retrieved: 1000 65 Relevant: Rel_ret: 65 Interpolated Recall - Precision Averages: at 0.00 1.0000 at 0.10 0.9130 12. We are grateful to Verity for allowing us to exarnine their ThEC-2 results in detail. 262 at 0.20 at 0.30 at 0.40 at 0.50 at 0.60 at 0.70 at 0.80 at 0.90 at 1.00 Average precision all rel docs: 0.9130 0.9130 0.9000 0.7609 0.5942 0.5679 0.5049 0.2027 0.0927 (non-interpolated) over 0.6838 Precision: At 5 docs: 1.0000 At 10 docs: 0.8000 At 15 docs: 0.8667 At 20 docs: 0.9000 At 30 docs: 0.9000 At 100 docs: 0.5100 At 200 docs: 0.2900 At 500 docs: 0.1280 At 1000 docs: 0.0650 R-Precision (precision after R (= num_rel for a query) docs retrieved) Exact: 0.5846 This shows the same recall performance (i.e., all 65 relevant documents were retrieved) but substantially better precision performance. Through the first 30 documents TOPIC2 gave excellent results, whereas our modified model-2 result was only half as good. Again however, the TOPIC2 tree is much more complex, and required more effort to develop.13 Overall, we are impressed by the improved perfor- mance we were able to achieve with minimal manual effort. These auxiliary experiments provide at least suggestive evi- dence of the value of automatic generation of initial trees. The extent to which this is consistently achievable will require further investigation, and we hope to report on this in TREC-3. 5 Commentary The official results of our ThEC-2 experiments demonstrate that automatic construction of routing queries from training documents is indeed feasible. The queries pro- duced are in fact binary classification trees that are optimal with respect to size (measured in terms of the number of ter- minals in the tree) and the estimated error rate of the tree. Unfortunately, however, these trees generally appear to have poor performance. In a few cases the trees were com- parable with the results from other sites, but they mostly 13. We do not have precise figures for the amount of effort needed to build the Verity TOPIC2 trees, but in general each topic required several hours of effort.