TIPSTER Text Summarization Evaluation Conference
Last updated:Tuesday, 16-Jan-2001 09:06:53 EST
Date created: Monday, 31-Jul-00
SUMMAC Overview

In May 1998, the U.S. government completed the TIPSTER Text Summarization Evaluation (SUMMAC), which was the first large-scale, developer-independent evaluation of automatic text summarization systems. Two main extrinsic evaluation tasks were defined, based on activities typically carried out by information analysts in the U.S. Government. In the adhoc task, the focus was on indicative summaries which were tailored to a particular topic. In the categorization task, the evaluation sought to find out whether a generic summary could effectively present enough information to allow an analyst to quickly and correctly categorize a document. The final, question-answering task involved an intrinsic evaluation where a topic-related summary for a document was evaluated in terms of its "informativeness", namely, the degree to which it contained answers found in the source document to a set of topic-related questions.

SUMMAC has established definitively in a large-scale evaluation that automatic text summarization is very effective in relevance assessment tasks. Summaries at relatively low compression rates (17% for adhoc, 10% for categorization) allowed for relevance assessment almost as accurate as with full-text (5% degradation in F-score for adhoc and 14% degradation for categorization, both degradations not being statistically significant), while reducing decision-making time by 40% (categorization) and 50% (adhoc). In the question-answering task, automatic methods for measuring informativeness of topic-related summaries were introduced; the systems' scores using the automatic methods were found to correlate positively with informativeness scores rendered by human judges. The evaluation methods used in the SUMMAC evaluation are of intrinsic interest to both summarization evaluation as well as evaluation of other "output-related" NLP technologies, where there may be many potentially acceptable outputs, with no automatic way to compare them.

For more information please contact: Inderjeet Mani (