Text REtrieval Conference (TREC)

Text REtrieval Conferences 1 - 5

For more information regarding Text REtrieval Conferences (TREC), please visit the TREC web site at http://trec.nist.gov.

There have been six Text REtrieval Conferences (TRECs); TREC-1 in November 1992, TREC-2 in August 1993, TREC-3 in November 1994, TREC-4 in November 1995, TREC-5 in November 1996 and TREC-6 in 1997. TREC-7 will be held in 1998. The number of participating systems has grown from 25 in TREC-1 to 35 in TREC-5, including most of the major text retrieval software companies and most of the universities doing research in text retrieval (see table). The diversity of the participating groups has ensured that TREC represents many different approaches to text retrieval, while the emphasis on individual experiments evaluated in a common setting has proven to be a major strength of TREC.

The test design and test collection used for document detection in TIPSTER was also used in TREC. The participants ran the various tasks, sent results into NIST for evaluation, presented the results at the TREC conferences, and submitted papers for a proceedings [Harman 1,2]. The test collection consists of over 1 million documents from diverse full-text sources, 250 topics, and the set of relevant documents or "right answers" to those topics. A Spanish collection has been built and used during TREC-3 and TREC-4, with a total of 50 topics and in TREC-5 a Chinese track has been added.

TREC-1 required significant system rebuilding by most groups due to the huge increase in the size of the document collection (from a traditional test collection of several megabytes in size to the 2 gigabyte TIPSTER collection). The results from TREC-2 showed significant improvements over the TREC-1 results, and should be viewed as the appropriate baseline representing state-of-the-art retrieval techniques as scaled up to handling a 2 gigabyte collection.

TREC-3 therefore provided the first opportunity for more complex experimentation. The major experiments in TREC-3 included the development of automatic query expansion techniques, the use of passages or subdocuments to increase the precision of retrieval results, and the use of the training information to select only the best terms for routing queries. Some groups explored hybrid approaches (such as the use of the Rocchio methodology in systems not using a vector space model), and others tried approaches that were radically different from their original approaches.

TREC-4 allowed a continuation of many of these complex experiments. The topics were made much shorter and this change triggered extensive investigations in automatic query expansion. There were also five new tasks, called tracks. These were added to help focus research on certain known problem areas, and included such issues as investigating searching as an interactive task by examining the process as well as the outcome, investigating techniques for merging results from the various TREC subcollections, examining the effects of corrupted data, and evaluating routing systems using a specific effectiveness measure. Additionally more groups participated in a track for Spanish retrieval.

TREC-5 represented a continuation (and expansion) of the complex experiments that most of the groups have done in past TRECs. Because the short topics of TREC-4 generated great interest in query expansion given little initial user input, the topics for TREC-5 were created at both a short length (similar to TREC-4), and as a fuller topic (similar to TREC-3). This influenced much of the work in TREC-5 in the main ad hoc task. The smaller focussed research tasks, called tracks, also received lots of interest. Some of the tracks from TREC-4 were continued, with 7 groups taking part in the Spanish testing, 3 groups in the database merging track, 7 groups in the filtering track, and 2 groups participating in an experimental version of the interactive track. Additionally there were 3 new tracks in TREC-5, with 9 groups participating in a Chinese retrieval track, 4 groups working in a new NLP track, and 5 groups participating in a revised "confusion" track that retrieved from OCR'd documents.

The TREC conferences have proven to be very successful, allowing broad participation in the overall DARPA TIPSTER effort, and causing widespread use of a very large test collection. All conferences have had very open, honest discussions of technical issues, and a significant "cross-fertilization" of ideas. This will be a continuing effort, with a TREC-6 conference scheduled in November of 1997.

[1] Harman D. (Ed.). Overview of the Third Text REtrieval Conference (TREC-3). National Institute of Standards and Technology Special Publication 500-225,1994.

[2] Harman D. (Ed.). The Fourth Text REtrieval Conference (TREC-4). Published in the National Institute of Standards and Technology Special Publication 500 series.

TREC-5 Participants

Apple Computer
Australian National University
CLARITECH Corporation
City University, London
Computer Technology Institute, Greece
Cornell University
Dublin City University, Ireland
FS Consulting
GE Corporate R & D/New York University
GSI-Erli, France
George Mason University
IBM Corporation (2 groups)
Information Technology Institute, Singapore
Institut de Recherche en Informatique de Toulouse
InText Systems (Australia)
Lexis-Nexis
MDS at RMIT, Australia
MITRE
Monash University, Australia
New Mexico State University (2 groups)
Open Text Corporation
Queens College, CUNY
Rutgers University (2 groups)
Swiss Federal Institute of Technology (ETH)
Universite de Neuchatel
University of California, Berkeley
University of California, San Diego
University of Glasgow
University of Illinois at Urbana-Champaign
University of Kansas
University of Maryland
University of Massachusetts, Amherst
University of North Carolina
University of Waterloo
Rank Xerox Research Center

TREC-4 Participants

Australian National University
CLARITECH/Carnegie Melon University
CITRI, Australia
City University, London
Cornell University
Department of Defense
Dublin City University
Excalibur Technologies, Inc.
FS Consulting
GE Corporate R & D
New York University
George Mason University
Georgia Institute of Technology
HNC, Inc.
Information Technology Institute
InText Systems (Australia)
Lexis-Nexis
Logicon Operating Systems
National University of Singapore
NEC Corporation
New Mexico State University
Oracle Corporation
Queens College, CUNY
Rutgers University (two groups)
Siemens Corporate Research Inc.
Swiss Federal Institute of Technology (ETH)
Universite de Neuchatel
University of California - Berkeley
University of California - Los Angeles
University of Central Florida
University of Glasgow
University of Kansas
University of Massachusetts at Amherst
University of Toronto
University of Virginia
University of Waterloo
Xerox Palo Alto Research Center

TREC-3 Participants

Australian National University
Bellcore
Carnegie Mellon University/CLARITECH
CITRI, Australia
City University, London
Cornell University
Dublin City University
Environmental Research Institute of Michigan
Fulcrum
George Mason University
Logicon, Inc.
Mayo Clinic/Foundation
Mead Data Central, Inc.
National Security Agency
NEC Corporation
New York University
Queens College
Rutgers University (two groups)
Siemens Corporate Research, Inc.
Swiss Federal Institute of Technology (ETH)
TRW/Paracel, Inc.
University of Massachusetts
University of Minnesota
University of California, Berkeley
University of Dortmund, Germany
Universite de Neuchatel
University of Central Florida
University of Toronto
VPI&SU (Virginia Tech)
Verity, Inc.
West Publishing Company
Xerox Palo Alto Research Center