PhasesSummary evaluation and evaluation evaluation
Phase 1: Assessor judged peers against his/her own models.
Phase 2: Assessor judged subset of peers for a subset of docsets twice - against two other humans’ summaries
Phase 3 (not implemented): 2 different assessors judge same peers using same models.