In addition to the above folders, there is a MOVIE_NAME.entity.types.txt
file for each movie which shows the type of each key entity in the movie (persons, locations, animals).
Each entity name should have corresponding image examples in the "images" folder.
Also, queries at the movie and scene level use those exact same names to refer to
persons or entities (locations) in the movie.
All relations between entities (person to person, person to location) are to be selected from the relationship ontology provided:
New in 2022, the DVU challenge is introducing an evaluation element to address explainability of results within the Video Understanding domain. To that end, Some movie-level and scene-level queries were selected to require systems to also submit with the results either a scene id (to reflect which scene the system believes to best provides evidence of a relation asked in a movie-level relationship question (Query type 3)), or a temporal segment (start time & end time) to localize the chosen interaction in a scene for query types 3 and 4.
Find all possible paths question.
The task for systems will be to list all possible paths from the source person to the target person.
Fill in the part of graph question.
The task for systems will be to identify the person / entity labelled Unknown_#.
All of Unknown's relations with other people / entities / concepts are listed.
In cases where one of these related nodes occurs more than once in the part of
graph questions, That node's name has been replaced with <BLANK>. Therefore any
nodes labelled <BLANK> are guaranteed to be one of the nodes named in this group of questions.
The subject type will always be the source person we are asking about. The predicate
will always be that person’s relation with another person, entity, or concept.
The subject in this question always contains the Unknown you are being asked to identify.
Multiple choice questions.
The task for systems will be to identify the correct answer for Unknown out of the 6 possible answers provided.
The subject type will always be the source person we are asking about.
The predicate will always be that person’s relation with another person, entity, or concept.
In this question the Unknown you are being asked to identify will always be in either the predicate or object.
NOTE:
For questions that ask for relationships, there is an attribute called 'scene' (e.g. scene="")
to be used by systems when submitting the final answer to indicate the scene id that best
represent the relationship they selected.
Example:
<item type="Relation" answer="Teacher_At" scene="6"/>
Find the unique scene.
Given a full, inclusive list of interactions unique to a specific scene in the movie,
teams should find which scene this is.
The subject type will always be the scene needed to be identified:
Teams will return the scene id, based on the segmented scenes reference files (csv files)
and/or segmented movie shots.
Fill in the graph space.
Find the person (labelled Unknown_#) in a specific scene with the following interactions with others.
Teams will be given a scene number, and a list of interactions to and from other people.
Teams should find the only person in that scene with those interactions.
Find next interaction in scene X between person Y and person Z
Given a specific scene X and a specific interaction between person Y and person Z,
participants will be asked to select either the next interaction between person Y and
Person Z in scene X or X + N, from a set of multiple choice options of different interactions.
The two persons in this question are always subjects and objects. While the interaction is the predicate.
NOTE:
Two questions in this query type are selected to ask systems to submit
a start time and end time of the interaction they selected for their answer.
The xml attributes are: 'start_time' and 'end_time'
The timestamps should be relevant to the scene in the question and in seconds (expired since begnnning of the scene).
Example:
<item type="Interaction" answer="talks to" start_time="30" end_time="35"/>
Find previous interaction in scene X between person Y and person Z
Given a specific scene X and a specific interaction between person Y and person Z,
participants will be asked to select either the previous interaction between person Y and
Person Z in scene X or X - N, from a set of multiple choice options of different interactions.
The two persons in this question are always subjects and objects. While the interaction is the predicate.
NOTE:
Two questions in this query type are selected to ask systems to submit
a start time and end time of the interaction they selected for their answer.
The xml attributes are: 'start_time' and 'end_time'
The timestamps should be relevant to the scene in the question and in seconds (expired since begnnning of the scene).
Example:
<item type="Interaction" answer="talks to" start_time="30" end_time="35"/>
Find the 1-to-1 relationship between scenes and natural language descriptions.
Given a natural language description, find the correct scene that matches this description.
Possible scenes will be given in a multiple choice options. The textual description of the scene
will be defined in the <item description>
Classify scene sentiment from a given scene.
Given a specific movie scene and a set of possible sentiments, classify the correct sentiment label for each given scene.