SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Classification Trees for Document Routing, A Report on the TREC Experiment
chapter
R. Tong
A. Winkler
P. Gage
National Institute of Standards and Technology
Donna K. Harman
<top>
<head> Tipster Topic Description
<num> Number: 001
<dom> Domain: International Economics
<title> Topic: Antitrust Cases Pending
<desc> Description:
Document discusses a pending antitrust case.
<narr> Narrative:
To be relevant, a document will discuss a pending antitrust case
and will identify the alleged violation as well as the government
entity investigating the case. Identification of the industry and
the companies involved is optional. The antitrust investigation
must be a result of a complaint, NOT as part of a routine review.
<con> Concept(s):
1. antitrust suit, antitrust objections, antitrust investigation,
antitrust dispute
2. monopoly, bid-rigging, illegal restraint of trade, insider
trading, price-f ixing
3. acquisition, merger, takeover, buyout
4. Federal Trade Commission (FTC), Interstate Commerce Commission
(ICC), Justice Department, U.S. Securities and Exchange Commission
(SEC), Japanese Fair Trade Commission
5. NOT antitrust immunity
<fac> Factor(s)
<def> Definition(s):
Antitrust - Laws to protect trade and commerce from unlawful
restraints and monopolies or unfair business practices.
Acquisition - The taking over by one company of a co trolling
interest in another, also called a takeover. The action may be
friendly or unfriendly.
Merger - The acquisition by one corporation of the stock of
another. The acquiring corporation then retires the other1s stock
and dissolves that corporation. Therefore, only one corporation
retains its identity in a merger.
<Itop>
Since this is a very comprehensive description it contains many topic-specific words.
Recognizing this, our approach is to:
* extract the <desc>, <narr>, <COn>, and <def> fields from the topic
specification,
* concatenate them, removing the SGML-style tags and the field labels (i.e.,
the DesCription:,Narrative:,COnCept (S) :,and Definition(s):
strings),
* map all the words into lowercase, remove duplicates and stop words2
from the resulting description, and
use the remaining list of words as the set of features.
2. We used a stop word list published by Fox [4] that contains 421 words.
212