SP500207
NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1)
CLARIT TREC Design, Experiments, and Results
chapter
D. Evans
R. Lefferts
G. Grefenstette
S. Handerson
W. Hersh
A. Archbold
National Institute of Standards and Technology
Donna K. Harman
CLARIT TREC Design, Experiments, and Results
David A. Evans, Robert G. Lefferts, Gregory Grefenstette,
Steven K. Handerson, William R. Hersh, Armar A. Archbold
Laboratory for Computational Linguistics
Depart ment of Philosophy
Carnegie Mellon University
1 Introduction
This report presents an abbreviated description1 of the approach and the results of[OCRerr]the CLARIT
team in completing the tasks of the "Text Retrieval Conference" (TREC) organized by the
National Institute of Standards and Technology (NIST) and the Defense Advanced Research
Projects Agency (DARPA) in 1992.2
1.1 A Characterization of the TREC Tasks
TREC activities required participants to `retrieve' 200 documents for each of 100 different `top-
ics' from a large database of full-text documents. Each topic was given as a one-page description
of an item of interest. This feature of the TREC tasks was somewhat unusual, at least com-
pared to many traditional `bibliographic'[OCRerr]retrieval evaluations, in which the topic or `query `is
a minimal, often telegraphic, single-phrase statement of a `subject' or `an interest'.3 However,
the principal distinguishing features of the TREC tasks were (1) their scal[OCRerr]involving a total
of approximately 2 gigabytes of text, representing approximately 750,000 full-text documents
of varying length-and (2) the careful attention of the organizers in evaluating the results
submitted by each participating group.
More specifically, TREC tasks were designed to simulate two general types of information
`retrieval' situations, "routing" and "ad-hoc" querying. "Routing" corresponds to situations in
which a topic is possibly well documented (e.g., with examples) and the user desires to find
more similar documents. In the case of TREC tasks, 50 topics were designated as "routing"
topics; each was accompanied by a set of documents judged to be "relevant" to the topic.4 The
first installment of the full set of documents, representing approximately 1.1-gigabytes of text,
was available to each team for use in identifying possible other relevant documents for each
1A more complete and detailed description of the CLARIT-TREC activities and results is available as a
technical report, [Evans et al. in preparation].
2The TREC activities were organized at the end of 1991. Data was made available in the Spring of 1992.
All processing results were submitted by September 1, 1992, to NIST. The "Conference" itself-a Workshop
involving the approximately two dozen groups that submitted partial or full processing results-took place on
November [OCRerr]6, 1992, in Rockville, MD.
3The longer statements of `topics' in TREC were arguably more interesting as a test of systems and more
representative of many contemporary information-seeking situations. See Figure 9 for a sample topic statement.
4The number of sample relevant documents varied greatly from topic to topic. Some topics had almost 100
sample relevants; other had only about ten.
251