Document
Understanding
Conferences

Introduction
Publications
Data
Guidelines

Raw Results of the DUC-2001 Summary Evaluation at NIST

During July, the ten information analysts who selected the 30 test document clusters for DUC-2001 and created the model summaries also assessed the quality of the submitted (peer) summaries. The raw results of this assessment process are now available to participating groups.

One run from each group was evaluated. All summaries within the run were judged except that only 5 single-document summaries per document set were included - the same 5 for each group. Also, multi-document summaries for document set 31 were not evaluated because the model summaries may have been missing material from some documents.

In the results provided here (phase 1), the assessor always used his or her summary as the model. For each model the set of peer summaries comprised the submitted system-created summaries, as well as models created by two other humans, and one or two baselines created automatically. See below for details on the definitions of the baselines. For use as models, the human-created summaries were divided into elementary discourse units (EDUs) by William Wong (thanks!) and then subjected to light editing by the summary author to concatenate glaringly short and difficult to interpret units with a preceding or following one. All summaries to be used as peers were divided roughly into sentences by a simple Perl program based on one developed at ISI and adapted slightly to the document collection at NIST. A total of 3926 summary pairs were judged. (The phase 2 evaluation results will involve the same asessor, one not familiar with the document sets, judging based on two different models, one the same as that used in phase 1.)

The result data are available as set of files. Document selectors / summarizers / assessors are identified by a code (A-I,K). Systems will not be named until the DUC workshop. In the meantime each system is identified by a single-letter code (L-Z). Each group has been provided with its code. Baselines are numbered (1,2), where the meaning of 1 depends on the summary type: P (per-document) or M (multi-document). All the summaries are formatted for SEE.

Here is a list of files you will find in the evaluation directory. Included are the baseline summaries, the submitted system-created summaries, the human summaries in peer- and model-formats, the SEE output file for each comparison, the script which drives the DUC version of SEE, and assorted other supporting information:

abbreviations_list: - abbrevations used in sentence separation (as a simple Perl disk hashtable)
assessment_instructions.doc: - instructions to the assessors
baseline_definitions: - definitions of the baselines
baselines.tar.gz: - the baseline summaries in sentences
breaksent-multi.pl: - simple sentence separator
models.tar.gz: - the lightly edited human summaries in EDUs
models_as_peers.tar.gz: - the human summaries in sentences
results1_table: - a tabular version of the SEE output for each comparison pair in phase 1
results_table_header: - labels for the columns in the results.table
SEE_file_naming: - SEE summary file and output file naming conventions
SEE_script1: - the script used by the DUC version of SEE to drive the evaluation process - you can use it along with the summary and output files to SEE what the assessors saw! Be careful, it will modify output files.
submissions.tar.gz: - the submitted summaries in sentences
subjects_and_types: - the summary author's subject and type designations

For data, past results, mailing list or other general information
contact: Lori Buckland ([email protected])
For other questions contact: Paul Over ([email protected])
Last updated:
Date created: Friday, 26-July-02