Cranfield Tradition

Note the emphasis on comparative !!
- absolute score of some effectiveness measure not meaningful
  - absolute score changes when assessor changes
  - query variability not accounted for
  - impact of collection size, etc. not accounted for
  - theoretical maximum of 1.0 for both recall & precision not obtainable by humans
- evaluation results are only comparable when they are from the same collection
  - a subset of a collection is a different collection
  - direct comparison of scores from two different TREC collections is invalid

Previous slide Next slide Back to first slide View graphic version