NIST created 3 baselines automatically based roughly on algorithms suggested by Daniel Marcu from earlier work
- Take the first 100 words in the document
- Take the first 50, 100, 200, 400 words in the most recent document.
- 23.3% of the 400-word summaries were shorter than the target.
- Take the first sentence in the 1st, 2nd, 3rd,… document in chronological sequence until you have the target summary size. Truncate the last sentence if target size is exceeded.
- 86.7% of the 400-word summaries and 10% of the 200-word summaries were shorter than the target .