The 2023 DVU development dataset consists of 19 movies in total (14 creative common movies and 5 licensed KinoLorber (https://kinolorberedu.com/) movies). The 5 Kinolorber movies (used as testing set in 2022) are under password-protected site which first needs a signed data agreement before downloading. The training dataset directory structure is as follows: 1- The 14 CC movies: - : This directory contain the set of whole movies as mp4 files - : This directory has a folder for each of the 14 movies as follows: - .tgf: This file includes the annotated static knowledge graph of the whole movie in a trivial graph format file. - knowledge.graph.tgf.header : The file explains the contents and format of the tgf file. - .png : Is an image of the knowledge graph for human inspection. - .actions.events.txt : This file includes the annotated significant actions and events by the actors in the movie. - actions.events.header : The file explains the contents and format of the actions.events file. NOTE: Not all movies include actions.events annotations. Only 10 out of the 14 include them. - images folder : This directory contains snapshot images of all main entities in the movie including main locations and actors. - .entity.types.txt : The file defines the type of each entity in the movie (person, location, concept {e.g. important object or idea}) - HLVU_Relationships_Definitions.xlsx : Represents all used relations ontology in the static movie knowledge graphs for the 14 movies. The relations used in the tgf files are chosen by annotators from this ontology (please check the readme file: dvu.development.dataset/movie_knowledge_graph/HLVU_relations_definitions_readme.txt). - .testing.queries.groundTruth : If this movie was used in 2020/2021 as a test movie, then this folder contains the xml queries, a correct submission, and a text file with the answers - NOTE: All entity names are consistent between the tgf (knowledge graph), actions.events file, and snapshot images file names. - : This directory contain the segmented movie scenes as webm files. Each scene file is named as -.webm - : This directory contain for each movie a master scene segmentation reference file (.csv) denoting the start and end times for each scene in the movie. - : This directory has a json file for each scene in each movie encoding the knowledge graph for that scene. PLEASE see the scene.kg.readme file for more details. - .testing.queries.groundTruth : This sub-directory has for each of the 4 testing movies in 2021, the queries and ground truth at the scene-level. - : This directory contain a txt file for each scene in each movie summarizing in few sentences the scene in natural language. Note: Scene annotations for Bagman and The_Illusionist only go up as far as scene 47 for each. This is due to time constrains during the annotation process. No queries are given about the later scenes for which there are no annotations. - vocab.dvu.json : This file contains the used vocabulary in the scene annotations (json graph files). Specifically, it has a set of - Emotional states [used by annotators to describe un-neutral actors' emotions when observed) - Interactions [used by annotators to describe the interaction type that may have happened between at least any two actors in a scene] - Relationships [used by annotators to establish a relationship between any two actors when it became apparent to them] - Sentiments [used by annotators to assign at least one sentiment to each scene] - Locations [used by annotators to describe the location type where the scene happened] - resources : This folder contains donated resources (e.g. annotations and outputs) by the DVU 2020 teams (Univ. of Zurich and TokyoTech) - human.generated.QA.annotations : This folder includes for each training movie a set of human generated questions/answers delimited by ":" Please check the readme file: human.generated.QA.annotations/human.QA.readme.txt for format details. 2- : This directory includes the annotations for movie and scene levels for the 5 licensed movies. The annotations follow the same format as the CC movies set. The only difference is that the actual movie and shot files need to be downloaded separetely after signing the data agreement form found here: kinolorber.dataset/DVU.Data.Agreement.Form.txt (Please specify you need the training dataset of the Kinolorber movies for DVU task). NOTE: The following 4 movies are to the best of our knowledge Creative Commons licensed. Links to license details on free YouTube hosting: https://www.youtube.com/watch?v=TH-ep4nCNWw - "Casino Jack" aka "Bagman" https://www.youtube.com/watch?v=j5JQ1iaq6rE - "Road to Bali" https://www.youtube.com/watch?v=KunQC6a6fPU - "The Illusionist" https://www.youtube.com/watch?v=D3LBZLA0V3k - "Manos: The Hands of Fate"