Deep Video Understanding Queries

Overview

The Deep Video Understanding (DVU) testing queries can be found in the following directory structure:

In addition to the above folders, there is a MOVIE_NAME.entity.types.txt file for each movie which shows the type of each key entity in the movie (persons, locations, animals). Each entity name should have corresponding image examples in the "images" folder. Also, queries at the movie and scene level use those exact same names to refer to persons or entities (locations) in the movie.

All relations between entities (person to person, person to location) are to be selected from the relationship ontology provided:

  1. Please see the movie-level relationships ontology: https://www-nlpir.nist.gov/projects/trecvid/dvu/dvu.development.dataset/movie_knowledge_graph/
  2. Please see the scene-level ontology of relationships, interactions, locations, and sentiments:
    https://www-nlpir.nist.gov/projects/trecvid/dvu/dvu.development.dataset/vocab.dvu.json
    The relationships in this json file includes all relationships in the movie-level relationships ontology
All interactions between persons to each other are to be selected from the vocab.dvu.json file provided. All sentiments of scenes are to be selected from the vocab.dvu.json file provided. Please note that the use of the term "entity" in this file refers to a location in the above entity.types files. A sample XML and DTD will be provided/updated and should be followed to correctly format your run submissions (Please check the website for updates).

Query types

The following is an overview of the movie-level and scene-level query types. Participants can submit runs to either of the movie-level, scene-level or both tracks. Each of the two tracks has 1 optional query type, while all other query types are required by teams participating. Please refer to the task guidelines page for more details:

Movie-level

Movie-level Query type 1 (Optional):

Fill in the part of graph question. The task for systems will be to identify the person / entity labelled Unknown_#. All of Unknown's relations with other people / entities / concepts are listed. In cases where one of these related nodes occurs more than once in the part of graph questions, That node's name has been replaced with <BLANK>. Therefore any nodes labelled <BLANK> are guaranteed to be one of the nodes named in this group of questions. The subject type will always be the source person we are asking about. The predicate will always be that person’s relation with another person, entity, or concept. The subject in this question always contains the Unknown you are being asked to identify.

Movie-level Query type 2 (Required):

Multiple choice questions. The task for systems will be to identify the correct answer for Unknown out of the 6 possible answers provided. The subject type will always be the source person we are asking about. The predicate will always be that person’s relation with another person, entity, or concept. In this question the Unknown you are being asked to identify will always be in either the predicate or object.

**************************************************************************************************************

Scene-level

Scene-level Query type 1 (Optional):

Find the unique scene. Given a full, inclusive list of interactions unique to a specific scene in the movie, teams should find which scene this is. The subject type will always be the scene needed to be identified: Teams will return the scene id, based on the segmented scenes reference files (csv files) and/or segmented movie shots.

Scene-level Query type 2 (Required):

Find next interaction in scene X between person Y and person Z Given a specific scene X and a specific interaction between person Y and person Z, participants will be asked to select either the next interaction between person Y and Person Z in scene X or X + N, from a set of multiple choice options of different interactions. The two persons in this question are always subjects and objects. While the interaction is the predicate.

Scene-level Query type 3 (Required):

Find previous interaction in scene X between person Y and person Z Given a specific scene X and a specific interaction between person Y and person Z, participants will be asked to select either the previous interaction between person Y and Person Z in scene X or X - N, from a set of multiple choice options of different interactions. The two persons in this question are always subjects and objects. While the interaction is the predicate.