SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Classification Trees for Document Routing, A Report on the TREC Experiment chapter R. Tong A. Winkler P. Gage National Institute of Standards and Technology Donna K. Harman Table 2: Size of Training Sets and Optimal Trees Size of Optimal Tree Size of Training Set Topic # adsbal absda2 adsbal absba2 Total Rel Total Rel 15 3 13 27 7 77 10 16 2 3 36 3 86 3 17 2 2 30 12 80 12 18 2 3 18 14 68 14 19 2 3 30 12 80 15 20 2 3 30 14 80 14 21 2 2 33 12 82 12b 22 2 2 28 10 78 10 23 2 2 29 10 79 10 24 2 2 48 10 98 10 25 2 2 39 6 89 6 a. Some relevant documents in the augmented training set already in the original training set. b. Some non-relevant documents in the augmented training set already in the original training set. The new training data did not have a significant impact on the size of optimal tree, except in those cases where there were additional relevant documents-that is, for topics 6, 8, 14, 15 and 19. The changes here were quite dramatic. For example, in the case of Topic 8 "Economic Projections" the addition of just one more relevant article changed the optimal tree from one with only one terminal node to one with 19! This suggests, of course, that for this topic the training data do not provide a very representative sample of texts. Of more interest, however, is whether these additional training data had any effect on the overall performance of the system. Table 3 shows the official results for absda2. The results for adsbal are retained for comparison, as are the results for the other Category B systems. Table 3: Performance with Additional Training Data Rel-Ret @ 200 Topic# #Rel -_________ _________ _________ adsbal absba2 Max Median Mm 1 131 2 25 67 32 2 2 172 15 9 33 21 9 219