These MER results are
preliminary results provided to facilitate analysis prior to the
TRECVID workshop and not to be released outside the TRECVID
community
Preliminary Sesame Results of the MER 2014
Evaluation
Date: October 10, 2014
Caveat Emptor
The results below are for a single participant in the TRECVID MER
Evaluation. The results are provided to facilitate analysis
of MER technologies prior to the TRECVID workshop. NIST is
not providing a cross-participant set of results at this time
because results are not directly comparable across teams.
MER Annotation Questions
The tables below are preliminary and "likely" to change based on
NIST's continued analysis.
The outputs of MER system were evaluated on two levels: the duration
of the key evidence snippets and questions posed to judges about the
query used to generate the recounting, the extracted evidence, and
how well the MER output convinced the judge that the video contained
the event.
Nominally 5 judges per video per team were asked to answer the
following questions posed as Likert-style statements.
Event Query Quality:
Likert text: "This seems like a concise and
logical query that would be created for the event."
Scope: Answered for each judged event query
Evidence Quality:
Likert text: "The evidence presented convinces me
that the video contains the [Event name] event."
[Event Name]: The name of the MED event
Scope: Answered for each judged recounting
Tag Quality:
Likert text: "The evidence presented convinces me
that the video contains the [Event name] event."
[Event Name]: The name of the MED event
Scope: Answered for each judged recounting
Temporal Evidence Localization:
Likert text: "The system chose the right window
of time to present the evidence"
Scope: Answered for snippets containing 2 or more
frames.
Spatial Evidence Localization:
Likert text: "The system chose the right bounding
box(es) to isolate the evidence"
Scope: Answered for snippets that include
bounding boxes
Results
Recounting
Percent (as a Percent of Original Video Duration)
Sesame *
20%
Event
Query Quality
Sesame
*
Strongly Disagree
6%
Disagree
10%
Neutral
12%
Agree
54%
Strongly Agree
17%
Evidence
Quality
Sesame
*
Strongly Disagree
24%
Disagree
13%
Neutral
10%
Agree
30%
Strongly Agree
22%
Tag
Quailty
Sesame
*
Strongly Disagree
24%
Disagree
24%
Neutral
12%
Agree
18%
Strongly Agree
23%
Temporal
Evidence Localization
Sesame
*
Strongly Disagree
22%
Disagree
16%
Neutral
15%
Agree
26%
Strongly Agree
21%
Not Available
0%
Spatial
Evidence Localization
Sesame
*
Strongly Disagree
0.00%
Disagree
0.00%
Neutral
0.00%
Agree
0.00%
Strongly Agree
0.00%
Not Available
100%
* - Debugged MER submissions
History:
V5 - Initial Version
V6
Used the correct name "Event Query Quality" instead of
"Query Conciseness".
Used the correct name " Evidence Quality" instead of "Key
Evidence Convincing".
Uses the correct name "Recounting Percent" for the duration
of key evidence.
Added Tag Quality, Temporal Evidence Localization, Spatial
Evidence Localization.