Experiment:

Given three independent sets of judgments for each of 48 TREC-4 topics

Rank the TREC-4 runs by mean average precision as evaluated using different combinations of judgments

Compute correlation among run rankings

Previous slide Next slide Back to first slide View graphic version