What is going on here?
Groups are working harder on the tracks
Inconsistency in relevance judgments is causing a ceiling effect
- Voorhees claims that this ceiling is 65% precision at 65% recall
There is no user interaction
- What is being measured is what a user would see after the initial query is entered