Two levels of experimentation
- Comparison of systems/components with regard to details of the search process and summary measures
Cross-site (matrix experiment)
- Comparison of systems with regard to summary measures only
- Did some systems perform significantly better?
- How strong were the main topic and searcher effects?
- How large were the interactions?
- How large should the sample have been?