MUC Data Sets

For each evaluation, ground truth had to be established to determine the reliability of the participating systems. Datasets were typically prepared by human annotators for training, dry run test, and formal run test usage. These datasets are now being made available wherever possible on this website.

The texts used for MUC 6 and MUC 7 are copyrighted materials and are only available through the Linguistic Data Consortium (LDC) for a small fee. The texts are available as: newswire articles for MUC-6 (MUC-VI Text Collection), and newswire articles for MUC-7 (North American News Text Corpora).

Contact the LDC for licensing of the texts and request the public domain prepared datasets used in MUC and the MUC scoring software. The MUC 3 and MUC 4 Data Sets are provided completely free of charge courtesy of FBIS (Federal Broadcast Information Services). The MET 2 Data Sets are provided completely free of charge courtesy of the US Government. They are available here in compressed and TAR'ed format.

MUC 3 and MUC 4 Data Sets

MET 2 Data Sets

Note: If you see the data, rather than a dialog box, then download the file and save it before uncompressing and un TARing the file.

For more website information contact: Ellen Voorhees
For more evaluation information contact: Nancy Chinchor
Last updated:
Date created: Friday, 12-Jan-01