Phase 2 initial results
Designed to gauge effect of different models
Restricted to multi-document summaries of size 50- and 200-words
Assessor used 2 models created by other authors
Within-assessor differences mostly very small:
Still want to compare to original judgments…