TRECVID 2003 Issues and Resolutions

TRECVID 2003: Issues and Resolutions

(last updated: )

Segmentation

Who would be willing and able to donate the standard shot reference?

Resolution: We will take CLIPS-IMAG up on their offer to provide the standard shot reference again. Although it would be theoretically cleaner to have no minimum shot size, we have practical problems with playing and manually judging shots smaller than 2 secs, so we will ask that shots smaller than two seconds be merged with another. Similarly, we will accept long shots due to a lack of time to develop good ways of subsegmenting these.

Feature extraction

Is "Indoors" too frequent to be practical and useful - given the high percentage of shots of anchors in studios?

Resolution: Replace with "Outdoors"

Should "weather news" be replaced with something much more specific like "weather map"?

Resolution: No

Should "sport segment" be replaced with something much more specific such as "goal scored" in American football?

Resolution: No

Should we continue to use average precision as a measure where a single number is useful?

Resolution: Yes

How do we set the maximum result size and ensure we handle evaluation of very frequent as well as very rare features efficiently?

Resolution: Set the maximum result set size to be very large (based on the total number of shots in the test collection). Allow systems to submit fewer shots than the maximum. Pool submissions and judge the pool for a given feature only to the depth needed for that feature. That is, judge the pool for a given feature in stages, adding shots from deeper and deeper in the pools only if additional shots containing the feature are found at the current stage.

Should we require submissions on the entire test set but only evaluate on a subset?

Resolution: No, use other means to keep pool sizes manageable.

How many runs will be accepted from each group?

Resolution: at most 10 - prioritized. NIST will determine how many can be included in the pools based on number of participating groups, overlap, etc. up to the median number of runs submitted. All submitted runs will then be evaluated using the judgments on the pools.

Should all groups be required to submit results for all topics within any given run?

Resolution: Yes, otherwise comparisons based on averaging across topics don't work.

Should we include one or more topics of this type: Tell me the name of the person in the example video/image.?

Resolution: Yes, we may

How else can we make it easier to compare systems despite the confounding effect of the human in the loop?

Resolution: (for interactive and manual tasks)
- Only one person per group will perform all the manual searches.
- Searcher has no experience with the topics beyond the general knowledge any adult might have.
- Search system has not been trained, pre-configured, or otherwise tuned to the topics

How do we set the maximum result size?

Resolution: Set the maximum result set size to be very large (based on the total number of shots in the test collection). Allow systems to submit fewer shots than the maximum. Pool submissions and judge the pool for a given topic only to the depth needed for that topic. That is, judge the pool for a given topic in stages, adding shots from deeper and deeper in the pools only if additional relevant shots are found at the current stage.

How many runs will be accepted from each group?

Resolution: at most 10 - prioritized. NIST will determine how many can be included in the pools based on number of participating groups, overlap, etc. up to the median number of runs submitted. All submitted runs will then be evaluated using the judgments on the pools.

Should we keep the number of topics low (~25) and encourage more runs designed to help assess the effect of various different system configurations?

Resolution: Yes

Should some types of runs be required?
- Resolution: Yes - a topic text vs ASR output / closed-captions-based transcript only; and a run using just the topic text and one designated non-text example per topic.
Under what, if any, realistic scenario would commercials be the target of topics?

Resolution: Person creating/buying TV advertising wants to find out what sorts of advertising video material is there - context for new advertising. Psychologist studying context of commercials.NIST may create some topics that have answers in commericials

Under what, if any, realistic scenario would standard weather reports, stock market reports the target of topics?

Resolution: Someone looking for history on a company's performance.OK, but NIST will not deliberately create topics with answers in standard weather and stock market reports.

General

Revise denominator of average precision to be MIN(k, N), where k = max result set size and N = #correct shots in the ground truth, which will allow best case value of AP=1

Resolution: OK

At least CNN repeats the same footage. Do we need to change our task definitions or evalution in light of this repetition?

Resolution: Not practical to address this problem this year.

Will someone provide keyframes for the development collection?

Resolution: We will ask CLIPS IMAG to do this along with creating the standard shot reference. DCU is willing if CLIPS cannot.

Will ASR output and/or closed captioning text be provided?

Resolution: Both will be provided

Should we add a story segmentation task?

Resolution: Yes, we will.

How can sites make it easier to compare techniques within their site? How can we make it easier to compare techniques across sites? How do experimenters avoid/minimize/balance learning effects with respect to topics in manual searches when many runs are needed and only one searched can be used for all manual runs?

Resolution: Better experimental design to hold constant, block, or balance by randomization all significant effects (other than that of the system. For example: use only one searcher for all manual search runs in a group; randomize the order in which topics are searcher for each run; hold search conditions constant, etc.

National Institute of
Standards and Technology Home

Last updated:
Date created: Monday, 9-May-03
For further information contact