Inconsistency
Most frequently cited “problem” of test collections
- undeniably true that relevance is highly subjective; judgments vary by assessor and for same assessor over time ...
- … but no evidence that these differences affect comparative evaluation of systems