DUC 2006 Task, Documents for Summarization, and Measures

DUC 2006 Task, Documents for Summarization, and Measures


	Document Understanding Conferences Introduction Publications Data Guidelines		D U C 2 0 0 6: Task, Documents, and Measures The system task for DUC 2006 will essentially be the same as the 2005 task and will model real-world complex question answering, in which a question cannot be answered by simply stating a name, date, quantity, etc. Given a topic and a set of 25 relevant documents, the task is to synthesize a fluent, well-organized 250-word summary of the documents that answers the question(s) in the topic statement. Successful performance on the task will benefit from a combination of IR and NLP capabilities, including passage retrieval, compression, and generation of fluent text. Documents for summarization NIST assessors will develop topics of interest to them. The assessor will create a topic and choose a set of 25 documents relevant to the topic. These documents will form the document cluster for that topic. The documents will come from the AQUAINT corpus, comprising newswire articles from the Associated Press and New York Times (1998-2000) and Xinhua News Agency (1996-2000). The corpus has the following DTD: AQUAINT corpus (DTD) There will be 50 topics in the test data; topics and document clusters will be distributed by NIST. Only DUC 2006 participants who have completed all required forms will be allowed access. Reference summaries Each topic and its document cluster will be given to 4 different NIST assessors, including the developer of the topic. The assessor will create a ~250-word summary of the document cluster that satisfies the information need expressed in the topic. These multiple references summaries will be used in the evaluation of summary content. System task System task: Given a DUC topic and a set of 25 documents relevant to the topic, create from the documents a brief, well-organized, fluent summary which answers the need for information expressed in the topic. The summary can be no longer than 250 words (whitespace-delimited tokens). Summaries over the size limit will be truncated. No bonus will be given for creating a shorter summary. No specific formatting other than linear is allowed. Each group can submit one set of results, i.e., one summary for each topic/cluster. Participating groups should be able to evaluate additional results themselves using ISI's ROUGE/BE package. Evaluation All summaries will first be truncated to 250 words. Where sentences need to be identified for automatic evaluation, NIST will then use a simple Perl script for sentence segmentation. NIST will manually evaluate the linguistic well-formedness of each submitted summary using a set of quality questions. NIST will manually evaluate the relative responsiveness of each submitted summary to the topic. Here are instructions to the assessors for judging responsiveness. NIST will run the latest version of ROUGE to compute ROUGE-2 and ROUGE-SU4, with stemming and keeping stopwords. Jackknifing will be implemented so that human and system scores can be compared. NIST will calculate overlap in Basic Elements (BE) between automatic and manual summaries. Summaries will be parsed with Minipar, and BE-F will be extracted. These BEs will be matched using the Head-Modifier criterion. Columbia University will organize an optional Pyramid evaluation for a subset of the topics. Participants who wish to have their automatic summaries evaluated using the Pyramid method will be asked to help with the manual annotation. Tools for DUC 2006 ISI's webpage on Basic Elements. Download also includes ROUGE version 1.5.5. Columbia's 2006 webpage on Pyramids DUC Workshop Papers and Presentations Each participant in the system task may submit a paper describing their system architecture, results, and analysis; these papers will be published in the DUC 2006 Workshop Proceedings. Participants who would like to give oral presentations of their papers at the workshop should submit a presentation proposal in May 2006, and NIST will select the groups who will present at the workshop.

For data, past results, mailing list or other general information
contact: Lori Buckland ([email protected])
For other questions contact: Hoa Dang (hoa.dang AT nist.gov)
Last updated:
Date created: Wednesday, 24-November-05