SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Machine Learning for Knowledge-Based Document Routing (A Report on the TREC-2 Experiment)
chapter
R. Tong
L. Appelbaum
National Institute of Standards and Technology
D. K. Harman
It is interesting to compare our results with Verity's
scores for these two topics. To do this we re-scored Verity's
TOPIC2 results on the AP corpus alone.12 For Topic 52,
Verity's results were:
Queryid (Num): 52
Total number of documents over all queries
Retrieved:
Relevant:
1000
345
Rel_ret: 317
Interpolated Recall - Precision Averages:
at 0.00 1.0000
at 0.10 0.9833
at 0.20 0.9342
at 0.30 0.8607
at 0.40 0.8314
at 0.50 0.7425
at 0.60 0.7125
at 0.70 0.6704
at 0.80 0.6161
at 0.90 0.3952
at 1.00 0.0000
Average precision (non-interpolated) over
all rel docs:
0.7159
Precision:
At S docs: 1.0000
At 10 docs: 1.0000
At 15 docs: 1.0000
At 20 docs: 1.0000
At 30 docs: 1.0000
At 100 docs: 0.9000
At 200 docs: 0.7900
At 500 docs: 0.5820
At 1000 docs: 0.3170
R-Precision (precision after R (= num_rel
for a query) docs retrieved)
Exact: 0.6812
Here we see better recall (317 of the 345 relevant docu-
ments retrieved) but with slightly lower precision. The
TOPIC2 tree for this topic is much more complex than the
one we developed, which explams the better recall. Notice
however that both trees gave perfect precision for the first
30 documents.
For Topic 54, Verity's TOPIC2 results were:
Queryid (Num): 54
Total number of documents over all queries
Retrieved: 1000
65
Relevant:
Rel_ret: 65
Interpolated Recall - Precision Averages:
at 0.00 1.0000
at 0.10 0.9130
12. We are grateful to Verity for allowing us to exarnine their
ThEC-2 results in detail.
262
at 0.20
at 0.30
at 0.40
at 0.50
at 0.60
at 0.70
at 0.80
at 0.90
at 1.00
Average precision
all rel docs:
0.9130
0.9130
0.9000
0.7609
0.5942
0.5679
0.5049
0.2027
0.0927
(non-interpolated) over
0.6838
Precision:
At 5 docs: 1.0000
At 10 docs: 0.8000
At 15 docs: 0.8667
At 20 docs: 0.9000
At 30 docs: 0.9000
At 100 docs: 0.5100
At 200 docs: 0.2900
At 500 docs: 0.1280
At 1000 docs: 0.0650
R-Precision (precision after R (= num_rel
for a query) docs retrieved)
Exact: 0.5846
This shows the same recall performance (i.e., all 65 relevant
documents were retrieved) but substantially better precision
performance. Through the first 30 documents TOPIC2 gave
excellent results, whereas our modified model-2 result was
only half as good. Again however, the TOPIC2 tree is much
more complex, and required more effort to develop.13
Overall, we are impressed by the improved perfor-
mance we were able to achieve with minimal manual effort.
These auxiliary experiments provide at least suggestive evi-
dence of the value of automatic generation of initial trees.
The extent to which this is consistently achievable will
require further investigation, and we hope to report on this
in TREC-3.
5 Commentary
The official results of our ThEC-2 experiments
demonstrate that automatic construction of routing queries
from training documents is indeed feasible. The queries pro-
duced are in fact binary classification trees that are optimal
with respect to size (measured in terms of the number of ter-
minals in the tree) and the estimated error rate of the tree.
Unfortunately, however, these trees generally appear to
have poor performance. In a few cases the trees were com-
parable with the results from other sites, but they mostly
13. We do not have precise figures for the amount of effort needed
to build the Verity TOPIC2 trees, but in general each topic required
several hours of effort.