Effect of Different Judgments
Similar highly-correlated results found using
- different query sets
- different evaluation measures
- different groups of assessors
- single opinion vs. group opinion judgments
Conclusion: comparative results are stable despite the idiosyncratic nature of relevance judgments