TACP logo

National Institute of Standards and Technology Home Page
TIPSTER Text Program
A multi-agency, multi-contractor program


TABLE OF CONTENTS


Introduction
TIPSTER Overview
TIPSTER Technology Overview
TIPSTER Related Research
Phase III Overview
TIPSTER Calendar
Reinvention Laboratory Project
What's New

Conceptual Papers
Generic Information Retrieval
Generic Text Extraction
Summarization Concepts
12 Month Workshop Notes

Conferences
Text Retrieval Conference
Multilingual Entity Task
Summarization Evaluation

More Information
Other Related Projects
Document Down Loading
Request for Change (RFC)
Glossary of Terms
TIPSTER Source Information

Return to Retrieval Group home page
Return to IAD home page

Last updated:

Date created: Monday, 31-Jul-00

Text REtrieval Conference (TREC)

Text REtrieval Conferences 1 - 5

For more information regarding Text REtrieval Conferences (TREC), please visit the TREC web site at http://trec.nist.gov.

There have been six Text REtrieval Conferences (TRECs); TREC-1 in November 1992, TREC-2 in August 1993, TREC-3 in November 1994, TREC-4 in November 1995, TREC-5 in November 1996 and TREC-6 in 1997. TREC-7 will be held in 1998. The number of participating systems has grown from 25 in TREC-1 to 35 in TREC-5, including most of the major text retrieval software companies and most of the universities doing research in text retrieval (see table). The diversity of the participating groups has ensured that TREC represents many different approaches to text retrieval, while the emphasis on individual experiments evaluated in a common setting has proven to be a major strength of TREC.

The test design and test collection used for document detection in TIPSTER was also used in TREC. The participants ran the various tasks, sent results into NIST for evaluation, presented the results at the TREC conferences, and submitted papers for a proceedings [Harman 1,2]. The test collection consists of over 1 million documents from diverse full-text sources, 250 topics, and the set of relevant documents or "right answers" to those topics. A Spanish collection has been built and used during TREC-3 and TREC-4, with a total of 50 topics and in TREC-5 a Chinese track has been added.

TREC-1 required significant system rebuilding by most groups due to the huge increase in the size of the document collection (from a traditional test collection of several megabytes in size to the 2 gigabyte TIPSTER collection). The results from TREC-2 showed significant improvements over the TREC-1 results, and should be viewed as the appropriate baseline representing state-of-the-art retrieval techniques as scaled up to handling a 2 gigabyte collection.

TREC-3 therefore provided the first opportunity for more complex experimentation. The major experiments in TREC-3 included the development of automatic query expansion techniques, the use of passages or subdocuments to increase the precision of retrieval results, and the use of the training information to select only the best terms for routing queries. Some groups explored hybrid approaches (such as the use of the Rocchio methodology in systems not using a vector space model), and others tried approaches that were radically different from their original approaches.

TREC-4 allowed a continuation of many of these complex experiments. The topics were made much shorter and this change triggered extensive investigations in automatic query expansion. There were also five new tasks, called tracks. These were added to help focus research on certain known problem areas, and included such issues as investigating searching as an interactive task by examining the process as well as the outcome, investigating techniques for merging results from the various TREC subcollections, examining the effects of corrupted data, and evaluating routing systems using a specific effectiveness measure. Additionally more groups participated in a track for Spanish retrieval.

TREC-5 represented a continuation (and expansion) of the complex experiments that most of the groups have done in past TRECs. Because the short topics of TREC-4 generated great interest in query expansion given little initial user input, the topics for TREC-5 were created at both a short length (similar to TREC-4), and as a fuller topic (similar to TREC-3). This influenced much of the work in TREC-5 in the main ad hoc task. The smaller focussed research tasks, called tracks, also received lots of interest. Some of the tracks from TREC-4 were continued, with 7 groups taking part in the Spanish testing, 3 groups in the database merging track, 7 groups in the filtering track, and 2 groups participating in an experimental version of the interactive track. Additionally there were 3 new tracks in TREC-5, with 9 groups participating in a Chinese retrieval track, 4 groups working in a new NLP track, and 5 groups participating in a revised "confusion" track that retrieved from OCR'd documents.

The TREC conferences have proven to be very successful, allowing broad participation in the overall DARPA TIPSTER effort, and causing widespread use of a very large test collection. All conferences have had very open, honest discussions of technical issues, and a significant "cross-fertilization" of ideas. This will be a continuing effort, with a TREC-6 conference scheduled in November of 1997.

[1] Harman D. (Ed.). Overview of the Third Text REtrieval Conference (TREC-3). National Institute of Standards and Technology Special Publication 500-225,1994.

[2] Harman D. (Ed.). The Fourth Text REtrieval Conference (TREC-4). Published in the National Institute of Standards and Technology Special Publication 500 series.

TREC-5 Participants

  • Apple Computer
  • Australian National University
  • CLARITECH Corporation
  • City University, London
  • Computer Technology Institute, Greece
  • Cornell University
  • Dublin City University, Ireland
  • FS Consulting
  • GE Corporate R & D/New York University
  • GSI-Erli, France
  • George Mason University
  • IBM Corporation (2 groups)
  • Information Technology Institute, Singapore
  • Institut de Recherche en Informatique de Toulouse
  • InText Systems (Australia)
  • Lexis-Nexis
  • MDS at RMIT, Australia
  • MITRE
  • Monash University, Australia
  • New Mexico State University (2 groups)
  • Open Text Corporation
  • Queens College, CUNY
  • Rutgers University (2 groups)
  • Swiss Federal Institute of Technology (ETH)
  • Universite de Neuchatel
  • University of California, Berkeley
  • University of California, San Diego
  • University of Glasgow
  • University of Illinois at Urbana-Champaign
  • University of Kansas
  • University of Maryland
  • University of Massachusetts, Amherst
  • University of North Carolina
  • University of Waterloo
  • Rank Xerox Research Center

TREC-4 Participants

  • Australian National University
  • CLARITECH/Carnegie Melon University
  • CITRI, Australia
  • City University, London
  • Cornell University
  • Department of Defense
  • Dublin City University
  • Excalibur Technologies, Inc.
  • FS Consulting
  • GE Corporate R & D
  • New York University
  • George Mason University
  • Georgia Institute of Technology
  • HNC, Inc.
  • Information Technology Institute
  • InText Systems (Australia)
  • Lexis-Nexis
  • Logicon Operating Systems
  • National University of Singapore
  • NEC Corporation
  • New Mexico State University
  • Oracle Corporation
  • Queens College, CUNY
  • Rutgers University (two groups)
  • Siemens Corporate Research Inc.
  • Swiss Federal Institute of Technology (ETH)
  • Universite de Neuchatel
  • University of California - Berkeley
  • University of California - Los Angeles
  • University of Central Florida
  • University of Glasgow
  • University of Kansas
  • University of Massachusetts at Amherst
  • University of Toronto
  • University of Virginia
  • University of Waterloo
  • Xerox Palo Alto Research Center

TREC-3 Participants

  • Australian National University
  • Bellcore
  • Carnegie Mellon University/CLARITECH
  • CITRI, Australia
  • City University, London
  • Cornell University
  • Dublin City University
  • Environmental Research Institute of Michigan
  • Fulcrum
  • George Mason University
  • Logicon, Inc.
  • Mayo Clinic/Foundation
  • Mead Data Central, Inc.
  • National Security Agency
  • NEC Corporation
  • New York University
  • Queens College
  • Rutgers University (two groups)
  • Siemens Corporate Research, Inc.
  • Swiss Federal Institute of Technology (ETH)
  • TRW/Paracel, Inc.
  • University of Massachusetts
  • University of Minnesota
  • University of California, Berkeley
  • University of Dortmund, Germany
  • Universite de Neuchatel
  • University of Central Florida
  • University of Toronto
  • VPI&SU (Virginia Tech)
  • Verity, Inc.
  • West Publishing Company
  • Xerox Palo Alto Research Center

Multi-colored horizontal rule