TRECVID 2003: Issues and Resolutions
(last updated: Friday, 09-May-2003 14:16:34 EDT)
- Segmentation
- Who would be willing and able to donate the standard shot reference?
-
Resolution: We will take CLIPS-IMAG up on their offer to provide the standard shot reference again. Although it would be theoretically cleaner to have
no minimum shot size, we have practical problems with playing and manually
judging shots smaller than 2 secs, so we will ask that shots smaller than
two seconds be merged with another. Similarly, we will accept long shots
due to a lack of time to develop good ways of subsegmenting these.
- Feature extraction
- Is "Indoors" too frequent to be practical and useful - given the high
percentage of shots of anchors in studios?
-
Resolution: Replace with "Outdoors"
- Should "weather news" be replaced with something much more specific
like "weather map"?
- Should "sport segment" be replaced with something much more specific
such as "goal scored" in American football?
- Should we continue to use average precision as a measure where a
single number is useful?
- How do we set the maximum result size and ensure we handle evaluation
of very frequent as well as very rare features efficiently?
-
Resolution: Set the maximum result set size to be very large (based
on the total number of shots in the test collection). Allow systems to
submit fewer shots than the maximum. Pool submissions and judge the
pool for a given feature only to the depth needed for that
feature. That is, judge the pool for a given feature in stages, adding
shots from deeper and deeper in the pools only if additional shots
containing the feature are found at the current stage.
-
Should we require submissions on the entire test set but only evaluate on
a subset?
-
Resolution: No, use other means to keep pool sizes manageable.
- How many runs will be accepted from each group?
-
Resolution: at most 10 - prioritized. NIST will determine how many can
be included in the pools based on number of participating groups, overlap,
etc. up to the median number of runs submitted. All submitted
runs will then be evaluated using the judgments on the pools.
- Search
- Should all groups be required to submit results for all topics within any given run?
-
Resolution: Yes, otherwise comparisons based on averaging across topics don't work.
- Should we include one or more topics of this type: Tell me the name of the person in the example video/image.?
- How else can we make it easier to compare systems despite the confounding effect of the human in the loop?
-
Resolution: (for interactive and manual tasks)
- Only one person per group will perform all the manual searches.
- Searcher has no experience with the topics beyond the general knowledge any adult might have.
- Search system has not been trained, pre-configured, or otherwise
tuned to the topics
- How do we set the maximum result size?
-
Resolution: Set the maximum result set size to be very large (based
on the total number of shots in the test collection). Allow systems to
submit fewer shots than the maximum. Pool submissions and judge the
pool for a given topic only to the depth needed for that topic. That
is, judge the pool for a given topic in stages, adding shots from
deeper and deeper in the pools only if additional relevant shots are
found at the current stage.
- How many runs will be accepted from each group?
-
Resolution: at most 10 - prioritized. NIST will determine how many can
be included in the pools based on number of participating groups, overlap,
etc. up to the median number of runs submitted. All submitted
runs will then be evaluated using the judgments on the pools.
- Should we keep the number of topics low (~25) and encourage
more runs designed to help assess the effect of various different
system configurations?
- Should some types of runs be required?
-
Resolution: Yes - a topic text vs ASR output / closed-captions-based transcript only; and a run using
just the topic text and one designated non-text example per topic.
- Under what, if any, realistic scenario would commercials be the target of topics?
-
Resolution: Person creating/buying TV advertising wants to find out what
sorts of advertising video material is there - context for new advertising.
Psychologist studying context of commercials.NIST may create some topics that
have answers in commericials
- Under what, if any, realistic scenario would standard weather reports, stock market reports the target of topics?
-
Resolution: Someone looking for history on a company's performance.OK,
but NIST will not deliberately create topics with answers in standard
weather and stock market reports.
- General
- Revise denominator of average precision to be MIN(k, N), where
k = max result set size and N = #correct shots in the ground truth, which will allow best case value of AP=1
- At least CNN repeats the same footage. Do we need to change our task definitions or evalution in light of this repetition?
-
Resolution: Not practical to address this problem this year.
- Will someone provide keyframes for the development collection?
-
Resolution: We will ask CLIPS IMAG to do this along with creating the standard shot reference. DCU is willing if CLIPS cannot.
- Will ASR output and/or closed captioning text be provided?
-
Resolution: Both will be provided
- Should we add a story segmentation task?
-
Resolution: Yes, we will.
- How can sites make it easier to compare techniques within their site?
How can we make it easier to compare techniques across sites?
How do experimenters avoid/minimize/balance learning effects with respect
to topics in manual searches when many runs are needed and only one searched
can be used for all manual runs?
-
Resolution: Better experimental design to hold constant, block, or
balance by randomization all significant effects (other than that of the
system. For example: use only one searcher for all manual search runs in
a group; randomize the order in which topics are searcher for each run;
hold search conditions constant, etc.
Last
updated: Friday, 09-May-2003 14:16:34 EDT
Date created:
Monday, 9-May-03
For further information contact