|TIPSTER Text Program
A multi-agency, multi-contractor program
TABLE OF CONTENTS
TIPSTER Technology Overview
TIPSTER Related Research
Phase III Overview
Reinvention Laboratory Project
Generic Information Retrieval
Generic Text Extraction
12 Month Workshop Notes
Text Retrieval Conference
Multilingual Entity Task
Other Related Projects
Document Down Loading
Request for Change (RFC)
Glossary of Terms
TIPSTER Source Information
Return to Retrieval Group home page
Return to IAD home page
Date created: Monday, 31-Jul-00
Text REtrieval Conference (TREC)
Text REtrieval Conferences 1 - 5
For more information regarding Text REtrieval Conferences (TREC), please visit the TREC web site at http://trec.nist.gov.
There have been six Text REtrieval Conferences (TRECs); TREC-1 in November 1992, TREC-2 in August 1993, TREC-3 in November 1994, TREC-4 in November 1995, TREC-5 in November 1996 and TREC-6 in 1997. TREC-7 will be held in 1998. The number of participating systems has grown from 25 in TREC-1 to 35 in TREC-5, including most of the major text retrieval software companies and most of the universities doing research in text retrieval (see table). The diversity of the participating groups has ensured that TREC represents many different approaches to text retrieval, while the emphasis on individual experiments evaluated in a common setting has proven to be a major strength of TREC.
The test design and test collection used for document detection in TIPSTER was also used in TREC. The participants ran the various tasks, sent results into NIST for evaluation, presented the results at the TREC conferences, and submitted papers for a proceedings [Harman 1,2]. The test collection consists of over 1 million documents from diverse full-text sources, 250 topics, and the set of relevant documents or "right answers" to those topics. A Spanish collection has been built and used during TREC-3 and TREC-4, with a total of 50 topics and in TREC-5 a Chinese track has been added.
TREC-1 required significant system rebuilding by most groups due to the huge increase in the size of the document collection (from a traditional test collection of several megabytes in size to the 2 gigabyte TIPSTER collection). The results from TREC-2 showed significant improvements over the TREC-1 results, and should be viewed as the appropriate baseline representing state-of-the-art retrieval techniques as scaled up to handling a 2 gigabyte collection.
TREC-3 therefore provided the first opportunity for more complex experimentation. The major experiments in TREC-3 included the development of automatic query expansion techniques, the use of passages or subdocuments to increase the precision of retrieval results, and the use of the training information to select only the best terms for routing queries. Some groups explored hybrid approaches (such as the use of the Rocchio methodology in systems not using a vector space model), and others tried approaches that were radically different from their original approaches.
TREC-4 allowed a continuation of many of these complex experiments. The topics were made much shorter and this change triggered extensive investigations in automatic query expansion. There were also five new tasks, called tracks. These were added to help focus research on certain known problem areas, and included such issues as investigating searching as an interactive task by examining the process as well as the outcome, investigating techniques for merging results from the various TREC subcollections, examining the effects of corrupted data, and evaluating routing systems using a specific effectiveness measure. Additionally more groups participated in a track for Spanish retrieval.
TREC-5 represented a continuation (and expansion) of the complex experiments that most of the groups have done in past TRECs. Because the short topics of TREC-4 generated great interest in query expansion given little initial user input, the topics for TREC-5 were created at both a short length (similar to TREC-4), and as a fuller topic (similar to TREC-3). This influenced much of the work in TREC-5 in the main ad hoc task. The smaller focussed research tasks, called tracks, also received lots of interest. Some of the tracks from TREC-4 were continued, with 7 groups taking part in the Spanish testing, 3 groups in the database merging track, 7 groups in the filtering track, and 2 groups participating in an experimental version of the interactive track. Additionally there were 3 new tracks in TREC-5, with 9 groups participating in a Chinese retrieval track, 4 groups working in a new NLP track, and 5 groups participating in a revised "confusion" track that retrieved from OCR'd documents.
The TREC conferences have proven to be very successful, allowing broad participation in the overall DARPA TIPSTER effort, and causing widespread use of a very large test collection. All conferences have had very open, honest discussions of technical issues, and a significant "cross-fertilization" of ideas. This will be a continuing effort, with a TREC-6 conference scheduled in November of 1997.
 Harman D. (Ed.). Overview of the Third Text REtrieval Conference (TREC-3). National Institute of Standards and Technology Special Publication 500-225,1994.
 Harman D. (Ed.). The Fourth Text REtrieval Conference (TREC-4). Published in the National Institute of Standards and Technology Special Publication 500 series.