Understanding Conferences |
D U C 2 0 0 5: Task, Documents, and MeasuresWhat's new for DUC 2005DUC 2005 marks a major change in direction from last year. The road mapping committee strongly recommended that new tasks be undertaken that are strongly tied to a clear user application. The report writing task discussed at the May meeting was obviously that, but since then there has been serious discussion in the program committee about working on some new evaluation methodologies and metrics that would better address the human variation issues discussed in Barcelona. Therefore it was decided that the main thrust of DUC 2005 would be to have a simpler (but still user-oriented) task that would allow the whole community to put some of their effort/time into helping with this new evaluation framework during 2005. The same task will likely be continued in DUC 2006 (which should happen in the late spring of that year) and will focus again on system performance within the improved framework. The system task in 2005 will be to synthesize from a set of 25-50 documents a brief, well-organized, fluent answer to a need for information that cannot be met by just stating a name, date, quantity, etc. This task will model real-world complex question answering and was suggested by "An Empirical Study of Information Synthesis Tasks" (Enrique Amigo, Julio Gonzalo, Victor Peinado, Anselmo Penas, Felisa Verdejo; {enrique,julio,victor,anselmo,felisa}@lsi.uned.es). The main goals in DUC 2005 and their associated actions are listed below. 1) Inclusion of user/task context information for systems and human summarizers
2) Evaluation of content in terms of more basic units of meaning
3) Better understanding of normal human variability in a summarization task and how it may affect evaluation of summarization systems
Documents for summarizationNIST assessors will be allowed to choose TREC topics of interest to them. Each of these topics will have at least 35 relevant documents associated with it. The assessor will read the documents for a topic, verify the relevance of each, look for aspects of the topic of particular interest, create a DUC topic reflecting the particular interest, and choose a subset of 25 - 50 documents relevant to the DUC topic. These documents will be the DUC test document cluster for that topic. The assessor will also specify the desired granularity of the summary ("general" or "specific") in a user profile. Here are the instructions given to the assessors for creating the topics. The documents will come from the following collections with their own taggings (see DTD): Test documents will be distributed by NIST via the Internet.Reference summariesThe NIST assessor who developed the DUC topic will create a ~250-word summary of the cluster that meets the need expressed in the topic. The summary will be written at a level of granularity consistent with the granularity requested in the user profile. For each topic, other NIST assessors will also be given the user profile, DUC topic, and document cluster and will be asked to create a summary that meets the needs expressed in the topic and user profile. These multiple reference summaries will be used in the evaluation. It is our intention, if funding can be secured, to create a total of 4 references summaries for each of 30 of the topics and 10 reference summaries for each of 20 of the topics. Here are the instructions given to the assessors for writing the summaries. Here are example summaries for two topics. System taskSystem task: Given a user profile, a DUC topic, and a cluster of documents relevant to the DUC topic, create from the documents a brief, well-organized, fluent summary which answers the need for information expressed in the topic, at the level of granularity specified in the user profile. The summary can be no longer than 250 words (whitespace-delimited tokens). Summaries over the size limit will be truncated. No bonus will be given for creating a shorter summary. No specific formatting other than linear is allowed. The summary should include (in some form or other) all the information in the documents that contributes to meeting the information need. Some generalization may be required to fit everything in. Each group can submit one set of results, i.e., one summary for each topic/cluster. Participating groups should be able to evaluate additional results themselves using automatic evaluation tools developed by ISI. EvaluationNISTNIST's role in the evaluation will be limited since it was considered more important in 2005 for NIST to apply its resources to creating more reference summaries than have been available in the past.
All summaries will first be truncated to 250 words. Where sentences need to be identified for automatic evaluation, NIST will then use a simple Perl script for sentence segmentation. ISI, Columbia, and others
The main evaluation of how well each submitted summary agrees in
content with the manually created reference summaries will be carried
out cooperatively by (hopefully) most of the participating groups under
the leadership of ISI/USC and Columbia University. This evaluation will
explore both automatic and manual approaches.
Tools for DUC 2005
|
For
data, past results, mailing list or other general information
contact:
Lori
Buckland (lori.buckland@nist.gov)
For
other questions contact: Hoa
Dang (hoa.dang AT nist.gov)
Last
updated: Friday, 08-Apr-2005 15:18:20 UTC
Date
created: Wednesday, 24-November-05