SP500215
NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2)
Machine Learning for Knowledge-Based Document Routing (A Report on the TREC-2 Experiment)
chapter
R. Tong
L. Appelbaum
National Institute of Standards and Technology
D. K. Harman
believe that the explanation for this is that by using
stemmed versions of the features we added a significant
amount of "noise" to the sample space. That is, given the
relatively small size of the training sets, using stems tends to
reduce the discriminating power of any given feature with
respect to the training sets. This manifests itself indireefly in
two ways. First, the optimal trees built with stems are gener-
ally smaller than those built using exact words; and, second,
optimal trees built with stems have higher cross-validation
error rates than those built using exact words.
The second observation is that the [OCRerr]R)PIC trees
built using model-2 (i.e., results ads3 and ads4) had better
recall than those built using model-i. The explanation for
this is straightforward. Since the model-2 trees essentially
use all the features extracted from the information need
statement in a generalized disjunction, they provide much
broader coverage than the model-i trees which often use
just one or two of the features.
Queryid (Num): 52
Total number of documents over all queries
Retrieved: 1000
Relevant: 345
Rel_ret: 328
Interpolated
at 0.00
at 0.10
at 0.20
at 0.30
at 0.40
at 0.50
at 0.60
at 0.70
at 0.80
at 0.90
at 1.00
Average precision
all rel docs:
Recall - Precision Averages:
0.6667
0.4684
0.4032
0 4032
0.4032
0.4032
0.4013
0.4013
0.4013
0.4003
0.0000
(non-interpolated) over
0.3828
Precision:
The third observation is, of course, that these are
not strong results since on average all the models performed
in the low end of the scores reported by MST in the sum-
mary routing table. This is somewhat disappointing since
our results in TREC-1 led us to believe that we might be
able to do significanfly better.11
Notwithstanding the fact that we did not explore
some of the ideas discussed in the JREC-1 paper (e.g., the
use of concepts rather than words as features, and the use of
surrogate split information), we are now inclined to the
view that the output from tools like CART are best used as
the basis for manually constructed routing topics. To begin
to explore this idea, we performed a number of auxiliary
tests that we report in the following section.
4.3 Auxiliary Experiments
To explore the idea of using the CART output as
the "skeleton" for a manually constructed routing query, we
selected two model-2 trees to determine whether a minimal
set of "edits" could significanfly improve their performance.
We selected Topics 52 and 54 since they represented one
topic for which the automatically generated tree did well
(Topic 52) and one for which the automatically generated
tree did poorly (Topic 54).
The scores for Topic 52 (for the AP corpus only)
are shown below:
11. We do note, however, that for a number of topics we did rather
well in comparison with other Systems (i.e., Topics 51 and 75), and
that in absolute terms we produced a number of trees that had
greater than 30% R-precision (i.e., for Topics 52, 58, 78 and 93).
259
0.4000
0.3000
At 15 docs: 0.4667
At 20 docs: 0.6000
At 30 docs: 0.6333
At 100 docs: 0.4100
At 200 docs: 0.3550
At 500 docs: 0.3820
At 1000 docs: 0.3280
R-Precision (precision after R num_rel
for a query) docs retrieved)
Exact: 0.3739
Here we see that this topic tree does well - recall is excel-
lent and precision is sustained even at high recall levels.
in contrast, the topic tree for Topic 54 produces the
following results:
Queryid (Num):
Total number of
Retrieved:
Relevant:
Re1[OCRerr]ret:
Interpolated
at 0.00
at 0.10
at 0.20
at 0.30
at 0.40
at 0.50
at 0.60
at 0.70
at 0.80
at 0.90
at 1.00
Average precision
all rel docs:
54
documents over
1000
65
64
Recall - Precision Averages:
0.1323
0.1323
0.1323
0.1323
0.1323
0.1323
all queries
0.1323
0.1102
0.1102
0.1097
0.0000
(non-interpolated) over
0.0907
At 5 docs:
At 10 docs: