SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Classification Trees for Document Routing, A Report on the TREC Experiment chapter R. Tong A. Winkler P. Gage National Institute of Standards and Technology Donna K. Harman also had only one decision node, but correctly identified 94 of 149 relevant documents: class 0 (0.126) azt<=0 .50 class 1 (1.000) Thus relevant documents are those that contain the word azt (the name of a drug for treating AIDS patients). 4.2 The Effect of Additional Training Data Our second set of official results was designed to investigate the sensitivity of system performance to the size of the training sets. To provide additional training samples we randomly selected a block of 50 Wall Street Journal articles7 for which we generated rele- vance judgements for the first 25 topics. In practice this gave us some additional relevant articles, but mostly contributed to the non-relevant examples. Table 2 shows the effect of adding the additional documents-notice that some articles in the new set were already included in the original training data. Table 2: Size of Training Sets and Optimal Trees Size of Optimal Tree Size of Training Set Topic # adsbal absda2 adsbal absba2 Total Rel Total Rel 1 7 7 34 7 83 `7a 2 8 14 31 7 81 8 3 4 5 30 9 80 9 4 3 5 29 12 79 12 5 3 3 30 18 79 18a 6 3 2 28 15 78 15 7 2 3 28 10 78 10 8 1 19 32 7 82 8 9 4 4 23 4 73 4 10 2 4 20 12 69 12b 11 2 2 21 12 71 12 12 8 11 35 8 85 8 13 2 2 12 8 62 8 14 10 13 30 5 80 6 7. The articles were in the block starting with W5J870311-0102 and ending with W5J870324-O001. 218