SP500207 NIST Special Publication 500-207: The First Text REtrieval Conference (TREC-1) Overview of the First Text REtrieval Conference (TREC-1) chapter D. Harman National Institute of Standards and Technology Donna K. Harman 2. The Task 2.1 Introduction [OCRerr]IREC is designed to encourage research in information retrieval using large data collections. Two types of retrieval are being [OCRerr]xarnined -- retrieval using an `tadhoc't query such as a researcher might use in a library environment, and retrieval using a "routing" query such as a profile to filter some incoming document stream. The TREC task is not tied to any given application, and is not concerned with interfaces or optimized response time for searching. However it is helpful to have some potential user in mind when designing or testing a retrieval system. The model for a user in TREC is a dedicated searcher, not a novice searcher, and the model for the application is one needing monitoring of data streams for information on specific topics (routing), and the ability to do adhoc searches on archived data for new topics. It should be assumed that the users need the abil- ity to do both high precision and high recall searches, and are willing to look at many documents and repeatedly modify queries in order to get high recall. Obviously they would like a system that makes this as easy as possi- ble, but this ease should be reflected in ThEC as added intelligence in the system rather than as special inter- faces. Since TREC has been designed to evaluate system performance both in a routing (filtering or profiling) mode, and in an adhoc mode, both functions need to be tested. The test design was based on traditional information retrieval models, and evaluation used traditional recall and precision measures. The following diagram of the test design shows the various components of TREC (fig. 1). FThi[OCRerr] Training Topics 1½ Qi Training Queries Q2 Routing Queries Test Topics Q3 Ad-hoc Q eries `U 1 Gigabyte 1 Gigabyte Training Test Documents Documents (Dl) (D2) Figure 1. The TREC Task. This diagram reflects the four data sets (2 sets of topics and 2 sets of documents) that were provided to partici- pants. These data sets (along with a set of sample relevance judgments for the 50 training topics) were used to 2