TRECVID 2022 Video Data Schedule Active Participants Attending TRECVID Workshop Contacts

Deep Video Understanding (DVU)

Task Coordinators: Keith Curtis, George Awad, and Asad Butt

Deep video understanding is a difficult task which requires computer vision systems to develop a deep analysis and understanding of the relationships between different entities in video, to use known information to reason about other, more hidden information, and to populate a knowledge graph (KG) with all acquired information. The aim of the proposed task is to push the limits of multimedia analysis techniques to address analysing long duration videos holistically and extract useful knwledge to utilize it in solving different kind of queries. The knowledge in the target queries includes both visual and non-visual elements. Participating systems should take into consideration all available modalities (speech, image/video, and in some cases text).

As movies provide an excellent testbed to provide the needed data because they can simulate the real world (people, relationships, locations, actions and interactions, motivations, intentions, etc) the DVU task is exercising it's challenge on the movie domain. As videos and multimedia data are getting more and more popular and usable by users in different domains, the research, approaches and techniques we aim to be applied in this task will be very relevant in the coming years and near future.

System Task

The task for participating researchers will be: given a whole original movie (e.g 1.5 - 2hrs long), image snapshots of main entities (persons, locations, and concepts) per movie, and ontology of relationships, interactions, locations, and sentiments used to annotate each movie at global movie-level (relationships between entities) as well as on fine-grained scene-level (scene sentiment, interactions between characters, and locations of scenes), systems are expected to generate a knowledge-base of the main actors and their relations (such as family, work, social, etc) over the whole movie, and of interactions between them over the scene level. This representation can be used to answer a set of queries on the movie-level and/or scene-level (see below details about query types) per movie. The task will support two tracks (subtasks) where teams can join one or both tracks. Movie track where participants are asked queries on the whole movie level, and Scene track where Queries are targeted towards specific movie scenes.

The DVU challenge is also running externally at the ACM Multimedia as a grand challenge. Participants are encouraged to also take part in the challenge to get exposed to more comprehensive query types and be able to submit their solution as a publication in the conference proceedings. Note the schedule for the grand challenge is different than the TRECVID DVU task schedule. The organizers will be doing their best efforts to unify the testing dataset used at TRECVID and the ACM MM Grand Challenge. For detailed information, please check the DVU grand challenge website: https://sites.google.com/view/dvuchallenge2022/

In addition to the grand challenge at ACM Multimedia, the organizers are also running a DVU related workshop at the 24th ACM International Conference on Multimodal Interaction (7-11 Nov 2022). All teams are invited to submit a paper of their work at the workshop. Papers will be peer reviewed and included in the conference proceedings. The Deep Video Understanding workshop website: https://sites.google.com/view/dvu2022-workshop

Data Resources

  • The Development Dataset

    A set of 14 Creative Common (CC) movies (total duration of 17.5 hr) previously utilized in 2020 and 2021 ACM Multimedia DVU Grand Challenges including their movie-level and scene-level annotations. The movies have been collected from public websites such as Vimeo and the Internet Archive. In total, the 14 movies from diverse genres consist of 621 scenes, 1572 entities, 650 relationships, and 2491 interactions. The development dataset can be accessed from this URL. Please consult the documentation folder readme files for more information on the contents of the dataset.
  • The Testing Dataset

    A set of 6 new movies will be distributed to participating teams. The movies have been licensed by NIST from the Kinolorberedu platform. All task participants will be able to download the movies after signing a data agreement. Please refer to the TRECVID 2022 schedule for availability of testing data, queries, and run submissions.

Subtasks and Query types

  • Movie-level Track

    • Required Query Type: Question Answering (QA)
      This query type (mandatory in the movie-level track) represents questions on the resulting knowledge base of the movies in the testing dataset. For example, we may ask 'How many children does Person A have?', in which case participating researchers should count the 'Parent Of' relationships Person A has in the Knowledge Graph. This query type will take a multiple choice questions format.
    • Optional Query Type: fill in the graph space
      Fill in spaces in the Knowledge Graph (KG). Given the listed relationships, events or actions for certain nodes, where some nodes are replaced by variables X, Y, etc., solve for X, Y etc. Example of The Simpsons: X Married To Marge. X Friend Of Lenny. Y Volunteers at Church. Y Neighbor Of X. Solution for X and Y in that case would be: X = Homer, Y = Ned Flanders.

    The below is a sample of a movie-level QA query. Image snapshots of the entities mentioned in the query (person and locations) will be Given with the testing dataset and queries.

  • Scene-level Track

    • Required Query Type: find next or previous interaction
      Given a specific scene and a specific interaction between person X and person Y, participants will be asked to return either the previous interaction or the next interaction, in either direction, between person X and person Y. This can be specifically the next or previous interaction within the same scene, or over the entire movie. This query type will take a multiple choice questions format and it is considered a mandatory query in the scene-level track)
    • Optional Query Type: find the unique scene
      Given a full, inclusive list of interactions, unique to a specific scene in the movie, teams should find which scene this is.

    The below is a sample of a scene-level find an interaction query. Image snapshots of the entities mentioned in the query (two persons) will be Given with the testing dataset and queries, while the interactions are included in the ontology distributed with the task.

Metrics

  • Movie-level : question answering
    Scores for this query will be calculated by the number of Correct Answers / number of Total Questions.
  • Movie-level : fill in the graph space
    Results will be treated as ranked list of result items per each unknown variable and the Reciprocal Rank score will be calculated per unknown variable and Mean Reciprocal Rank (MRR) per query.
  • Scene-level : find next or previous interaction
    Scores for this query will be calculated by the number of Correct Answers / number of Total Questions.
  • Scene-level : find the unique scene
    Results will be treated as ranked list of result items per each unknown variable and the Reciprocal Rank score will be calculated per unknown variable and Mean Reciprocal Rank (MRR) per query.

Run submission format

Each participating team can submit up to 4 runs per track (movie or scene). Each run should contain results for all queries in the testing dataset. Please see the provided DTD files for run formats of both movie-level and scene-level results.
Also, a small xml example for movie-level run and scene-level run. The below are sample queries and responses of movie and scene level queries:
  • Movie-level question answering sample query:
  • Movie-level question answering Sample response:
  • Movie-level fill in the graph sample query:
  • Movie-level fill in the graph Sample response:
  • Scene-level next interaction sample query:
  • Scene-level next interaction Sample response:
  • Scene-level previous interaction sample query:
  • Scene-level previous interaction Sample response:
  • Scene-level find unique scene sample query:
  • Scene-level find unique scene Sample response:

Digital Video Retrieval at NIST

Digital Video Retrieval at NIST
News magazine, science news, news reports, documentaries, educational programming, and archival video

Digital Video Retrieval at NIST
TV Episodes

Digital Video Retrieval at NIST
Airport Security Cameras & Activity Detection

Digital Video Retrieval at NIST
Video collections from News, Sound & Vision, Internet Archive,
Social Media, BBC Eastenders