TACP logo

National Institute of Standards and Technology Home Page
TIPSTER Text Program A multi-agency, multi-contractor program


TIPSTER Overview
TIPSTER Technology Overview
TIPSTER Related Research
Phase III Overview
TIPSTER Calendar
Reinvention Laboratory Project
What's New

Conceptual Papers
Generic Information Retrieval
Generic Text Extraction
Summarization Concepts
12 Month Workshop Notes

Text Retrieval Conference
Multilingual Entity Task
Summarization Evaluation

More Information
Other Related Projects
Document Down Loading
Request for Change (RFC)
Glossary of Terms
TIPSTER Source Information

Return to Retrieval Group home page
Return to IAD home page

Last updated:

Date created: Monday, 31-Jul-00

The TIPSTER Text Program was a Defense Advanced Research Projects Agency (DARPA ) led government effort to advance the state of the art in text processing technologies through the cooperation of researchers and developers in Government, industry and academia. The resulting capabilities were deployed within the intelligence community to provide analysts with improved operational tools. Due to lack of funding, this program formally ended in the Fall of 1998.

DARPA, the Department of Defense (DoD) and the Central Intelligence Agency (CIA) jointly funded and managed the program, in close collaboration with the National Institute of Standards and Technology (NIST) and the Space and Naval Warfare Systems Center (SPAWAR, or SSC), formerly NCCOSC/NRaD. A TIPSTER Advisory Board was formed in 1998 with members representing users from other Government agencies interested in automated text processing, such as the Department of Energy (DOE), Federal Bureau of Investigation (FBI), Internal Revenue Service (IRS), National Science Foundation (NSF), Treasury Department and other Government agencies.

In its efforts to improve document processing efficiency and cost effectiveness TIPSTER focused on three underlying technologies.

  • Document Detection: the capability to locate documents containing the type of information the user wants from either a text stream or a store of documents.
  • Information Extraction: the capability to locate specified information within a text.
  • Summarization: the capability to condense the size of a document or collection while retaining the key ideas in the material

These three capabilities formed the basis for nearly all other information handling tasks.


During the first phase of TIPSTER research efforts, (1991-1994), the participants made major advances in creating the algorithms for document detection and information extraction and in improving the techniques for measuring those advances, through activities such as the Message Understanding Conferences (MUC) and the Text Retrieval Conferences (TREC). Document Detection technologies improved Recall from roughly 30% to as high as 75% and the improvement in the processing of natural language queries was also significant. Improvements in Information Extraction produced increases in Recall from roughly 49% to 65% and in Precision from 55% to 59%, and dramatic gains were made in the ability to automatically identify a wide range of items such as names (both personal and organizational), dates, locations, times, phone numbers, etc.


The TIPSTER research and development community turned its attention to the creation of a software architecture during the second phase, (April 1994-September 1996), in order to standardize the technology components, enable "plug and play" capabilities among the various tools being developed, and permit the sharing of software among the various participants. Based on feedback from the researchers, developers, and users of the existing prototype and implementation systems, the architecture, funding permitted, continued to evolve.

The Multilingual Entity Task (MET) developed Chinese and Japanese training collectons with over 300 documents in each language. The task was initially confined to Named Entity extraction and the development of a variety of tools such as word boundary finder, part-of-speech tagged Chinese lexicons and dictionaries.

Various research projects and demonstration systems in support of Document Detection and Information Extraction were also completed.


Phase III started in October 1996 and continued to build on Phase I and II achievements with new projects in supporting research, development and evaluation areas. Also, summarization was added as a fundamental task area. See Phase III Overview

Multi-colored horizontal rule