SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Classification Trees for Document Routing, A Report on the TREC Experiment
chapter
R. Tong
A. Winkler
P. Gage
National Institute of Standards and Technology
Donna K. Harman
system infrastructure needed to handle the TREC data and to produce the
official results. We used an "off-the-shelf" version of CART (written in C).
* The data were stored in compressed form and only uncompressed as
needed. This was a satisfactory strategy for tree construction since the
training sets involved relatively few documents. However, the time over-
head was unacceptable when we came to do the actual test classifications.
4. Official Results and Performance Analysis
ADS submitted two sets of results for the routing queries. In the first set (denoted
adsbal) we used the classification trees generated by exactly the information provided
by the training data. In the second set (denoted adsba2) we used trees generated using
an augmented training data. To generate this additional data we randomly selected an
additional block of 50 Wall Street Journal articles from the training corpus and then one
of us made the relevance judgements with respect to the 25 topics. This data was then
added to the original collection of relevance judgements to provide a larger training set
from which the second set of trees were grown. As noted above, we used a set of priors
that reflect the low density of relevant documents, together with a cost function that
encourages recall over precision. We also performed a number of auxiliary tests to help
with our interpretation of the official results. These are all described in the following sec-
tions.
4.1 The Baseline Experiment
The baseline experiment (adsbal) was designed to explore how well our approach
could do with absolutely no manual intervention and with the minimum of training
data. So for this experiment we used just those documents in the training set for which
there were relevance judgments.
Table 1 shows the performance on the baseline experiment together with the perfor-
mance from the other Category B systems. We have chosen to show only the number of
relevant-retrieved documents at the 200 document cut-off point since we believe that this
gives a more accurate picture of the ability of the system to perform document routing
than do the precision and recall numbers5.
Table 1: Performance on Baseline Experiment
Rel-Ret @ 200
Topic# #Rel
adsbal Max Median Mm
1 131 2 67 32 2
2 172 15 33 21 9
3 304 3 130 48 3
4 20 1 18 7 1
215