SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Classification Trees for Document Routing, A Report on the TREC Experiment chapter R. Tong A. Winkler P. Gage National Institute of Standards and Technology Donna K. Harman <top> <head> Tipster Topic Description <num> Number: 001 <dom> Domain: International Economics <title> Topic: Antitrust Cases Pending <desc> Description: Document discusses a pending antitrust case. <narr> Narrative: To be relevant, a document will discuss a pending antitrust case and will identify the alleged violation as well as the government entity investigating the case. Identification of the industry and the companies involved is optional. The antitrust investigation must be a result of a complaint, NOT as part of a routine review. <con> Concept(s): 1. antitrust suit, antitrust objections, antitrust investigation, antitrust dispute 2. monopoly, bid-rigging, illegal restraint of trade, insider trading, price-f ixing 3. acquisition, merger, takeover, buyout 4. Federal Trade Commission (FTC), Interstate Commerce Commission (ICC), Justice Department, U.S. Securities and Exchange Commission (SEC), Japanese Fair Trade Commission 5. NOT antitrust immunity <fac> Factor(s) <def> Definition(s): Antitrust - Laws to protect trade and commerce from unlawful restraints and monopolies or unfair business practices. Acquisition - The taking over by one company of a co trolling interest in another, also called a takeover. The action may be friendly or unfriendly. Merger - The acquisition by one corporation of the stock of another. The acquiring corporation then retires the other1s stock and dissolves that corporation. Therefore, only one corporation retains its identity in a merger. <Itop> Since this is a very comprehensive description it contains many topic-specific words. Recognizing this, our approach is to: * extract the <desc>, <narr>, <COn>, and <def> fields from the topic specification, * concatenate them, removing the SGML-style tags and the field labels (i.e., the DesCription:,Narrative:,COnCept (S) :,and Definition(s): strings), * map all the words into lowercase, remove duplicates and stop words2 from the resulting description, and use the remaining list of words as the set of features. 2. We used a stop word list published by Fox [4] that contains 421 words. 212