These MER results are preliminary results provided to facilitate analysis prior to the TRECVID workshop and not to be released outside the TRECVID community

Preliminary Sesame Results of the MER 2014 Evaluation

Date: October 10, 2014

Caveat Emptor

The results below are for a single participant in the TRECVID MER Evaluation. The results are provided to facilitate analysis of MER technologies prior to the TRECVID workshop. NIST is not providing a cross-participant set of results at this time because results are not directly comparable across teams.

MER Annotation Questions

The tables below are preliminary and "likely" to change based on NIST's continued analysis.

The outputs of MER system were evaluated on two levels: the duration of the key evidence snippets and questions posed to judges about the query used to generate the recounting, the extracted evidence, and how well the MER output convinced the judge that the video contained the event.

Nominally 5 judges per video per team were asked to answer the following questions posed as Likert-style statements.

Event Query Quality:
    Likert text: "This seems like a concise and logical query that would be created for the event."
    Scope: Answered for each judged event query

Evidence Quality:
    Likert text: "The evidence presented convinces me that the video contains the [Event name] event."
    [Event Name]: The name of the MED event
    Scope: Answered for each judged recounting

Tag Quality:
    Likert text: "The evidence presented convinces me that the video contains the [Event name] event."
    [Event Name]: The name of the MED event
    Scope: Answered for each judged recounting

Temporal Evidence Localization:
    Likert text: "The system chose the right window of time to present the evidence"
    Scope: Answered for snippets containing 2 or more frames.

Spatial Evidence Localization:
    Likert text: "The system chose the right bounding box(es) to isolate the evidence"
    Scope: Answered for snippets that include bounding boxes

Results

	Recounting Percent (as a Percent of Original Video Duration)
	Sesame *
	20%

Event Query Quality
	Sesame *
Strongly Disagree	6%
Disagree	10%
Neutral	12%
Agree	54%
Strongly Agree	17%

Evidence Quality
	Sesame *
Strongly Disagree	24%
Disagree	13%
Neutral	10%
Agree	30%
Strongly Agree	22%

Tag Quailty
	Sesame *
Strongly Disagree	24%
Disagree	24%
Neutral	12%
Agree	18%
Strongly Agree	23%

Temporal Evidence Localization
	Sesame *
Strongly Disagree	22%
Disagree	16%
Neutral	15%
Agree	26%
Strongly Agree	21%
Not Available	0%

Spatial Evidence Localization
	Sesame *
Strongly Disagree	0.00%
Disagree	0.00%
Neutral	0.00%
Agree	0.00%
Strongly Agree	0.00%
Not Available	100%

* - Debugged MER submissions

History:

V5 - Initial Version
V6

Used the correct name "Event Query Quality" instead of "Query Conciseness".
Used the correct name " Evidence Quality" instead of "Key Evidence Convincing".
Uses the correct name "Recounting Percent" for the duration of key evidence.
Added Tag Quality, Temporal Evidence Localization, Spatial Evidence Localization.