SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Classification Trees for Document Routing, A Report on the TREC Experiment chapter R. Tong A. Winkler P. Gage National Institute of Standards and Technology Donna K. Harman For example, the tree for Topic 22 [OCRerr]`Counternarcotics" becomes: class 0 (0.000) drug<=0 .50 class 1 (0.905) Thus although the tree size is unchanged, the test is now on the word drug instead of coca. As we might expect this turns out to be a much less generally useful test and the tree only identifies 8 of the relevant articles-actually the minimum retrieved by a Cate- gory B system. The tree for Topic 15 [OCRerr][OCRerr]CEO" is one which shows marked change from the adsbal ver- sion, growing from two decision nodes to twelve, and retrieving 49 relevant documents instead of 29. The tree is: class 0 chiet<=0 .50 (0.000) class 0 (0.000) executive<=0 .50 class 1 (1.000) company<=0 .50 class 0 (0.000) executive<=l .50 class 0 (0.000) name<=0 .50 class 1 (1.000) coinpany<=l .50 class 0 (0.000) resign<=0 .50 class 0 (0.000) chief<=l .50 class 1 (1.000) name<=3 .50 class 0 executive<=9 .00 class 0 (0.000) appoint<=0 .50 class 0 (0.000) ceo<=0 .50 class 0 (0.000) (0.000) This is a much more complex structure than the other trees illustrated so far. Note that there are three terminal nodes that lead to a document being classified as relevant. Although they are in the same sub-tree defined by the expression: chiet>0 & ceo & `appoint & executive>0 & executive<l0 & name<4 they make minor distinctions based on the words company, name, resign and chief. Thus we have three further tests: executive<2 & company executive>l & company>l & resign>0 & chiet>l executive>l & coinpany<2 & name>0 221