SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) CLARIT TREC Design, Experiments, and Results chapter D. Evans R. Lefferts G. Grefenstette S. Handerson W. Hersh A. Archbold National Institute of Standards and Technology Donna K. Harman CLARIT TREC Design, Experiments, and Results David A. Evans, Robert G. Lefferts, Gregory Grefenstette, Steven K. Handerson, William R. Hersh, Armar A. Archbold Laboratory for Computational Linguistics Depart ment of Philosophy Carnegie Mellon University 1 Introduction This report presents an abbreviated description1 of the approach and the results of[OCRerr]the CLARIT team in completing the tasks of the "Text Retrieval Conference" (TREC) organized by the National Institute of Standards and Technology (NIST) and the Defense Advanced Research Projects Agency (DARPA) in 1992.2 1.1 A Characterization of the TREC Tasks TREC activities required participants to `retrieve' 200 documents for each of 100 different `top- ics' from a large database of full-text documents. Each topic was given as a one-page description of an item of interest. This feature of the TREC tasks was somewhat unusual, at least com- pared to many traditional `bibliographic'[OCRerr]retrieval evaluations, in which the topic or `query `is a minimal, often telegraphic, single-phrase statement of a `subject' or `an interest'.3 However, the principal distinguishing features of the TREC tasks were (1) their scal[OCRerr]involving a total of approximately 2 gigabytes of text, representing approximately 750,000 full-text documents of varying length-and (2) the careful attention of the organizers in evaluating the results submitted by each participating group. More specifically, TREC tasks were designed to simulate two general types of information `retrieval' situations, "routing" and "ad-hoc" querying. "Routing" corresponds to situations in which a topic is possibly well documented (e.g., with examples) and the user desires to find more similar documents. In the case of TREC tasks, 50 topics were designated as "routing" topics; each was accompanied by a set of documents judged to be "relevant" to the topic.4 The first installment of the full set of documents, representing approximately 1.1-gigabytes of text, was available to each team for use in identifying possible other relevant documents for each 1A more complete and detailed description of the CLARIT-TREC activities and results is available as a technical report, [Evans et al. in preparation]. 2The TREC activities were organized at the end of 1991. Data was made available in the Spring of 1992. All processing results were submitted by September 1, 1992, to NIST. The "Conference" itself-a Workshop involving the approximately two dozen groups that submitted partial or full processing results-took place on November [OCRerr]6, 1992, in Rockville, MD. 3The longer statements of `topics' in TREC were arguably more interesting as a test of systems and more representative of many contemporary information-seeking situations. See Figure 9 for a sample topic statement. 4The number of sample relevant documents varied greatly from topic to topic. Some topics had almost 100 sample relevants; other had only about ten. 251