DUC 2002: Automatically produced baseline abstracts

Unlike the case of DUC 2001, we did not truncate any sentences. We allowed baselines to go as much as 15 words over the target size. If adding another sentence's worth of words would have made the baseline summary exceed the target by more than 15 words, we did not include that sentence and the summary was then shorter than the targeted size.

In the following, words are whitespace-delimited non-tag tokens found in the TEXT, LEADPARA, LP, etc. portions of the document

For single-document summarization:

Baseline 1 (lead baseline)

Take the first 100 words* in the document.

For multi-document summarization:

Baseline 2 (lead baseline)

Take the first 50, 100, and 200 words* in the last document in the collection, where docs are assumed to be ordered chronologically. (No baseline of this type for 10-word summaries).

Baseline 3 (coverage baseline)

Take the first sentence in the first doc, the first sentence in the second doc, the first sentence in the third doc, ... until you have 50, 100, or 200 words. (No baseline of this type for the 10-word summaries)

