SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
Overview of the First Text REtrieval Conference (TREC-1)
chapter
D. Harman
National Institute of Standards and Technology
Donna K. Harman
2[OCRerr] The Participants
There were 25 participating systems in ThEC-1, using a wide range of retrieval techniques. The participants
were able to choose from three levels of participation: Category A, full participation, Category B, full participa-
tion using a reduced dataset (25 topics and 1/4 of the full document set), and Category C for evaluation only (to
allow commercial systems to protect proprietary algorithms). The program committee selected only 20 category
A and B groups to present talks because of limited conference time, and requested that the rest of the groups
present posters. All groups were asked to submit papers for the proceedings.
Each group was provided the data, and asked to turn in either one or two sets of results for each topic.
When two sets of results were sent, they could be made using different methods of creating queries (Methods 1,
2, or 3), or by using different parameter settings for one query creation method. Groups could choose to do the
routing task, the adhoc task, or both, and were requested to submit the top 200 documents retrieved for each
topic for evaluation.
3. The Test Collection
3.1 Introduction
Critical to the success of TREC was the creation of the test collection. Like most traditional retrieval collec-
tions, there are three distinct parts to this collection. The first is the documents themselves -- the training set
(Dl) and the test set (D2). Both were distributed as CD-ROMs with about 1 gigabyte of data each, compressed
to fit. The training topics, the test topics and the relevance judgments were supplied by email. TREC-1 used
the same test collection (documents and topics) used in the DARPA TIPSThR project. (The DARPA TWSThR
project involves the same tasks as TREC, but with four contractors doing more intense research than is being
expected from TREC participants (Harman 1993)). However a major increase in the number of relevance judg-
ments for this collection became available from the TREC-1 evaluation.
The components of the test collection -- the documents, the topics, and the relevance judgments, are dis-
cussed in the rest of this section.
3.2 The Documents
The documents came from the following sources.
. Diski
----- Wall Street Jounnal (1986, 1987, 1988, 1989)
--- AP Newswire (1989)
---- Information from Computer Select disks (Ziff-Davis Publishing)
e --- Federal Register (1989)
---- Short abstracts from the Department of Energy
e Disk2
----- Wall Street Journal (1990, 1991, 1992)
--- AP Newswire (1988)
---- Information from Computer Select disks (Ziff-Davis Publishing)
--- Federal Register (1988)
The particular sources were selected because they reflected the different types of documents used in the ima-
gined TREC application. Specifically they had a varied length, a varied writing style, a varied level of edit-
ing and a varied vocabulary. All participants were required to sign a detailed user agreement for the data in
order to protect the copyrighted source material.
5