SP500215 NIST Special Publication 500-215: The Second Text REtrieval Conference (TREC-2) Machine Learning for Knowledge-Based Document Routing (A Report on the TREC-2 Experiment) chapter R. Tong L. Appelbaum National Institute of Standards and Technology D. K. Harman believe that the explanation for this is that by using stemmed versions of the features we added a significant amount of "noise" to the sample space. That is, given the relatively small size of the training sets, using stems tends to reduce the discriminating power of any given feature with respect to the training sets. This manifests itself indireefly in two ways. First, the optimal trees built with stems are gener- ally smaller than those built using exact words; and, second, optimal trees built with stems have higher cross-validation error rates than those built using exact words. The second observation is that the [OCRerr]R)PIC trees built using model-2 (i.e., results ads3 and ads4) had better recall than those built using model-i. The explanation for this is straightforward. Since the model-2 trees essentially use all the features extracted from the information need statement in a generalized disjunction, they provide much broader coverage than the model-i trees which often use just one or two of the features. Queryid (Num): 52 Total number of documents over all queries Retrieved: 1000 Relevant: 345 Rel_ret: 328 Interpolated at 0.00 at 0.10 at 0.20 at 0.30 at 0.40 at 0.50 at 0.60 at 0.70 at 0.80 at 0.90 at 1.00 Average precision all rel docs: Recall - Precision Averages: 0.6667 0.4684 0.4032 0 4032 0.4032 0.4032 0.4013 0.4013 0.4013 0.4003 0.0000 (non-interpolated) over 0.3828 Precision: The third observation is, of course, that these are not strong results since on average all the models performed in the low end of the scores reported by MST in the sum- mary routing table. This is somewhat disappointing since our results in TREC-1 led us to believe that we might be able to do significanfly better.11 Notwithstanding the fact that we did not explore some of the ideas discussed in the JREC-1 paper (e.g., the use of concepts rather than words as features, and the use of surrogate split information), we are now inclined to the view that the output from tools like CART are best used as the basis for manually constructed routing topics. To begin to explore this idea, we performed a number of auxiliary tests that we report in the following section. 4.3 Auxiliary Experiments To explore the idea of using the CART output as the "skeleton" for a manually constructed routing query, we selected two model-2 trees to determine whether a minimal set of "edits" could significanfly improve their performance. We selected Topics 52 and 54 since they represented one topic for which the automatically generated tree did well (Topic 52) and one for which the automatically generated tree did poorly (Topic 54). The scores for Topic 52 (for the AP corpus only) are shown below: 11. We do note, however, that for a number of topics we did rather well in comparison with other Systems (i.e., Topics 51 and 75), and that in absolute terms we produced a number of trees that had greater than 30% R-precision (i.e., for Topics 52, 58, 78 and 93). 259 0.4000 0.3000 At 15 docs: 0.4667 At 20 docs: 0.6000 At 30 docs: 0.6333 At 100 docs: 0.4100 At 200 docs: 0.3550 At 500 docs: 0.3820 At 1000 docs: 0.3280 R-Precision (precision after R num_rel for a query) docs retrieved) Exact: 0.3739 Here we see that this topic tree does well - recall is excel- lent and precision is sustained even at high recall levels. in contrast, the topic tree for Topic 54 produces the following results: Queryid (Num): Total number of Retrieved: Relevant: Re1[OCRerr]ret: Interpolated at 0.00 at 0.10 at 0.20 at 0.30 at 0.40 at 0.50 at 0.60 at 0.70 at 0.80 at 0.90 at 1.00 Average precision all rel docs: 54 documents over 1000 65 64 Recall - Precision Averages: 0.1323 0.1323 0.1323 0.1323 0.1323 0.1323 all queries 0.1323 0.1102 0.1102 0.1097 0.0000 (non-interpolated) over 0.0907 At 5 docs: At 10 docs: