Summing up …
Per-unit content (coverage):
- Assessors were often in a quandary about how much information to bring to the interpretation of the summaries.
- Even for simple sentences/EDUs, determination of shared meaning was very hard.
For the future:
- Results seem to pass sanity check?
- Was the assessor time worth it in terms of what researchers can learn from the output?