SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Classification Trees for Document Routing, A Report on the TREC Experiment chapter R. Tong A. Winkler P. Gage National Institute of Standards and Technology Donna K. Harman system infrastructure needed to handle the TREC data and to produce the official results. We used an "off-the-shelf" version of CART (written in C). * The data were stored in compressed form and only uncompressed as needed. This was a satisfactory strategy for tree construction since the training sets involved relatively few documents. However, the time over- head was unacceptable when we came to do the actual test classifications. 4. Official Results and Performance Analysis ADS submitted two sets of results for the routing queries. In the first set (denoted adsbal) we used the classification trees generated by exactly the information provided by the training data. In the second set (denoted adsba2) we used trees generated using an augmented training data. To generate this additional data we randomly selected an additional block of 50 Wall Street Journal articles from the training corpus and then one of us made the relevance judgements with respect to the 25 topics. This data was then added to the original collection of relevance judgements to provide a larger training set from which the second set of trees were grown. As noted above, we used a set of priors that reflect the low density of relevant documents, together with a cost function that encourages recall over precision. We also performed a number of auxiliary tests to help with our interpretation of the official results. These are all described in the following sec- tions. 4.1 The Baseline Experiment The baseline experiment (adsbal) was designed to explore how well our approach could do with absolutely no manual intervention and with the minimum of training data. So for this experiment we used just those documents in the training set for which there were relevance judgments. Table 1 shows the performance on the baseline experiment together with the perfor- mance from the other Category B systems. We have chosen to show only the number of relevant-retrieved documents at the 200 document cut-off point since we believe that this gives a more accurate picture of the ability of the system to perform document routing than do the precision and recall numbers5. Table 1: Performance on Baseline Experiment Rel-Ret @ 200 Topic# #Rel adsbal Max Median Mm 1 131 2 67 32 2 2 172 15 33 21 9 3 304 3 130 48 3 4 20 1 18 7 1 215