TIPSTER Text Summarization Evaluation Conference (SUMMAC) Computation and Language (cmp-lg) corpus

TIPSTER Text Summarization Evaluation Conference
(SUMMAC)

SUMMAC Overview
Final Report
Results of Evaluation
Computation and Language (cmp_lg) corpus
TREC home page
Retrieval Group home page
IAD home page

National Institute of Standards and Technology Home Page

Last updated:
Date created: Monday, 31-Jul-00

Computation and Language (cmp-lg) corpus

As part of the TIPSTER SUMMAC effort, a corpus of 183 documents from the Computation and Language (cmp-lg) collection has been marked up in xml and made available as a general resource to the information retrieval, extraction, and summarization communities. The documents are scientific papers which appeared in Association for Computational Linguistics (ACL) sponsored conferences. The markup is based on automatic conversion from latex to xml, and as a result is fairly minimal. (However, something is often better than nothing!) The markup includes tags covering core information such as title, author, date, etc., as well as basic structure such as abstract, body, sections, lists, etc. Figures, tables, equations, cross-references and references were all replaced with placeholder tags.

cmplg-xml.tar.gz

The corpus was prepared by The MITRE Corporation and the University of Edinburgh.

For more information, contact Simone Teufel [email protected]

The following link is to the dtd used: mini.dtd.txt