Call for Participation in TRECVID 2022


February 2022 - December 2022

Conducted by the National Institute of Standards and Technology (NIST)
with additional funding from other US government agencies. Below you
can find an overview on the used datasets, tasks, and how to apply to
participate. All teams are encouraged to apply early to get access to
data and join the slack workspace of active teams.
Please consult the guidelines for each task for more details

Application URL:

Application deadline: June 1. [apply early to get access to task discussions, participants mailing lists, and datasets]


The TREC Video Retrieval Evaluation series ( promotes
progress in content-based analysis of and retrieval from digital video
via open, metrics-based evaluation. TRECVID is a laboratory-style
evaluation that attempts to model real world situations or significant
component tasks involved in such situations. In its 22nd annual evaluation
cycle TRECVID will evaluate participating systems on 6 different video
analysis and retrieval tasks using various types of real world datasets.
Below is the main datasets to be used in 2022 across the 6 proposed tasks.


In TRECVID 2022 NIST will use at least the following data sets:

      * Vimeo Creative Commons Collection (V3C)

      The V3C is a large-scale video dataset that has been collected from high-quality
      web videos with a time span over several years in order to represent true videos
      in the wild. It consists of 28,450 videos with a duration of 3,801 hours in total.
      In 2022, the V3C2 subcollection (1,300 hr and 1.4 million shots) will be utilized
      as testing dataset, while V3C1 (1,000 hr and 1 million shots) previously adopted at
      TRECVID from 2019-2021 as a development dataset. This new V3C2 subcollection is
      planned to be adopted from 2022 to 2025.

      * IACC.3

      The IACC.3 was introduced in 2016 and consists of approximately 4600 Internet
      Archive videos (144 GB, 600 h) with Creative Commons licenses in MPEG-4/H.264
      format with duration ranging from 6.5 min to 9.5 min and a mean duration
      of almost 7.8 min. Most videos will have some metadata provided by the
      donor available e.g., title, keywords, and description. The
      IACC.3 is provided as development dataset for teams.

      * Kino lorber edu Movies

      A set of 10 movies licensed from Kino Lorber Edu (
      will be available to support the movie summarization and deep video understanding.
      All movies are in English with duration between 1.5 - 2 hrs each. Participants Will
      be able to download the whole original movies and use the data for research only purpose
      within TRECVID tasks.

      * Deep Video Understanding (DVU)

      A set of 14 movies (total duration of 17.5 hr) with Creative Commons license previosuly
      utilized at the ACM Multimedia Grand Challenges in 2020 and 2021 will be available.
      The dataset contain movie-level and scene-level annotations. The movies have been collected
      from public websites such as Vimeo and the Internet Archive. In total, the 14 movies consist
      of 621 scenes, 1572 entities, 650 relationships, and 2491 interactions.

      * Twitter Vine videos

      Approximately 8,000 6 sec video clips URLs from the public Twitter stream of Vine videos
      have been human annotated by video captions from 2016-2019. These Vine videos will be provided
      as an additional development data for participants of the Video-to-Text (VTT) task.

      * Gatwick and i-LIDS MCT airport surveillance video

      The data consist of about 150 hours obtained from airport
      surveillance video data (courtesy of the UK Home Office). The
      Linguistic Data Consortium has provided event annotations for
      the entire corpus. The corpus was divided into development and
      evaluation subsets. Annotations for 2008 development and test
      sets are available.

      * MEVA dataset

      The TRECVID ActEV 2022 Challenge is based on the Multiview Extended Video
      with Activities (MEVA) Known Facility (KF) dataset. The large-scale MEVA
      dataset is designed for activity detection in multi-camera environments.
      It was created on the Intelligence Advanced Research Projects Activity (IARPA)
      Deep Intermodal Video Analytics (DIVA) program to support DIVA performers and
      the broader research community. You can download the public MEVA resources
      (training video, training annotations and the test set) at

       * LADI dataset

      The Low Altitude Disaster Imagery (LADI) dataset is hosted as part of the AWS Public Dataset program
      and will be available to participants of the DSDI task as development data. It consists of over 20,000+
      annotated images, each at least 4 MB in size. The annotated features were selected based on a recommendation
      from the public safety community. In total there are 32 features across 5 categories. The dataset was
      collected between 2015 - 2019 during major natural disaster events (e.g. hurricanes, floodings, earthquakes)
      across several USA states. The lower altitude criteria is intended to further distinguish the LADI dataset
      from satellite or "top down" datasets and to support development of computer vision capabilities
      for small drones operating at low altitudes. A minimum image size was selected to maximize the efficiency
      of the crowd source workers. For more information about LADI, please refer to the github organization.


In TRECVID 2022 NIST will evaluate systems on the following tasks
using the [data] indicated:

    * AVS: Ad-hoc Video Search (automatic, manually-assisted, relevance feedback) [V3C2]

      The Ad-hoc search task started in TRECVID 2016 and will continue in 2022
      to model the end user search use-case, who is looking for
      segments of video containing persons, objects, activities, locations, etc.,
      and combinations of the former. Given about 30 textual queries created at
      NIST, return for each query all the shots which meet the video need expressed
      by it, ranked in order of confidence. Although all evaluated submissions will be
      for automatic runs, Interactive systems will have the opportunity to
      participate in the Video Browser Showdown (VBS) in 2022 using
      the same testing data (V3C2).

    * ActEV: Activities in Extended Video [MEVA]

      ActEV is a series of evaluations to accelerate development of robust, multi-camera,
      automatic activity detection algorithms for forensic and real-time alerting applications.
      ActEV is an extension of the annual TRECVID Surveillance Event Detection (SED) evaluation
      where systems will also detect, and track objects involved in the activities. Each evaluation
      will challenge systems with new data, system requirements, and/or new activities.

    * DVU: Deep Video Understanding [Kino lorber edu movies]

      Deep video understanding is a difficult task which requires computer vision systems to develop
      a deep analysis and understanding of the relationships between different entities in video,
      and to use known information to reason about other, more hidden information.
      The aim of the proposed task is to push the limits of multimedia analysis techniques to
      address analysing long duration videos holistically and extract useful knwledge to utilize it
      in solving different kind of queries. The knowledge in the target queries includes both visual
      and non-visual elements. Participating systems should take into consideration all available
      modalities (speech, image/video, and in some cases text). The task for participating researchers
      will be: given a whole original movie (e.g 1.5 - 2hrs long), image snapshots of main entities
      (persons, locations, and concepts) per movie, and ontology of relationships, interactions,
      locations, and sentiments used to annotate each movie at global movie-level (relationships between entities)
      as well as on fine-grained scene-level (scene sentiment, interactions between characters, and locations
      of scenes), systems are expected to generate a knowledge-base of the main actors and their relations
      (such as family, work, social, etc) over the whole movie, and of interactions between them over
      the scene level. This representation can be used to answer a set of queries on the movie-level
      and/or scene-level per movie. The task will support two tracks (subtasks) where teams
      can join one or both tracks. Movie track where participants are asked queries on the whole movie level,
      and Scene track where Queries are targeted towards specific movie scenes.

    * VTT: Video to Text Description [V3C2]

      Automatic annotation of videos using natural language text descriptions has been a long-standing goal
      of computer vision. The task involves understanding of many concepts such as objects, actions,
      scenes, person-object relations, temporal order of events and many others. In recent years there have
      been major advances in computer vision techniques which enabled researchers to start practically to
      work on solving such problem. Given a set of short video clips, systems are asked to work and
      submit results for two subtasks: The "Description Generation" subtask requires systems to automatically
      generate a text description (1 sentence) for each video clip.

    * MSUM: Movie Summarization [Kino lorber edu movies]
      An important need in many situations involving video collections (archive video search/reuse,
      personal video organization/search, movies, tv shows, etc.) is to summarize the video in order
      to reduce the size and concentrate the amount of high value information in the video track.
      In 2022 a new movie summarization track in TRECVID will ask participating teams to
      summarize the major key-fact events of specific characters over the whole movie.
      The task will support visual (video) summary as well as textual summary tracks. The objective
      of the task is not only summarization, but testing systems on their ability to detect salient events
      to construct a meaningful summary

    * DSDI: Disaster Scene Description and Indexing [Real world natural disaster video and image footage]

      Computer vision capabilities have rapidly been advancing and are expected to become an important
      component to incident and disaster response. However, the majority of computer vision capabilities
      are not meeting public safety needs, such as support for search and rescue, due to the lack
      of appropriate training data and requirements. In response, the organizers developed a dataset of
      images collected by the Civil Air Patrol of various natural disasters. Two key distinctions are the
      low altitude and oblique perspective of the imagery and disaster-related features, which are rarely
      featured in computer vision benchmarks and datasets. This task invites researchers to work on this
      new domain to develop new capabilities and close the gap in performance to essentially label short
      video clips with the correct disaster-related feature(s).

In addition to the data, TRECVID will provide uniform scoring procedures, and a forum for organizations
interested in comparing their approaches and results.

Participants will be encouraged to share resources and intermediate system outputs to lower entry barriers
and enable analysis of various components' contributions and interactions.

* You are invited to participate in TRECVID 2022 *

The evaluation is defined by the Guidelines. A draft version is
available and further feedback input from the participants are welcomed till April,2022.

You should read the guidelines carefully before applying to participate in one or more tasks:

Please note

1) Dissemination of TRECVID work and results other than in the
(publicly available) conference proceedings is welcomed, but the
conditions of participation specifically preclude any advertising
claims based on TRECVID results.

2) All system output and results submitted to NIST are published in
the Proceedings or on the public portions of TRECVID web site archive.

3) The workshop is open to participating groups that submit
results for at least one task, to selected government personnel
from sponsoring agencies, data donors, and interested researchers
who may never participated Before and would like to know more about TRECVID.

4) Each participating group is required to submit before the
workshop a notebook paper describing their experiments and results.
This is true even for groups who may not be able to attend the

5) It is the responsibility of each team contact to make sure that
information distributed via the call for participation and the email list is disseminated to all team members with
a need to know. This includes information about deadlines and
restrictions on use of data.

6) By applying to participate you indicate your acceptance of the
above conditions and obligations.

There is a tentative schedule for the tasks included in the Guidelines
webpage: Schedule

Workshop format

The workshop format as being in-person, hybrid, Or virtual in 2022 is still
something to be decided. Details will be provided to participants as soon as available.

The TRECVID workshop is used as a forum both for presentation of
results (including failure analyses and system comparisons), and for
more lengthy system presentations describing retrieval techniques
used, experiments run using the data, and other issues of interest to
researchers in information retrieval and computer vision. As there is
a limited amount of time for these presentations, the evaluation coordinators
and NIST will determine which groups are asked to speak and which groups will
present in a poster session. Groups that are interested in having a
speaking slot during the workshop will be asked to submit a short
abstract before the workshop describing the experiments they
performed. Speakers will be selected based on these abstracts.

How to respond to this call

Organizations wishing to participate in TRECVID 2022 must respond
to this call for participation by submitting an on-line application by
the latest 1 June (the earlier the better). Only ONE APPLICATION PER TEAM
please, regardless of how many organizations the team comprises.

*PLEASE* only apply if you are able and fully intend to complete the
work for at least one task. Taking the data but not submitting any
runs threatens the continued operation of the workshop and the
availability of data for the entire community.

Here is the application URL:

You will receive an immediate automatic response when your application
is received. NIST will respond with more detail to all applications submitted
before the end of March.  At that point you'll be
given the active participant's userid and password, be subscribed to
the tv22.list email discussion list, and can participate in finalizing
the guidelines as well as sign up to get the data, which is controlled
by separate passwords. All active teams will also be added to a slack
workspace to encourage more communication and facilitate announcements.

TRECVID 2022 email discussion list

The tv22.list email discussion list ( will serve as
the main forum for discussion and for dissemination information about
TRECVID 2022.  It is each participant's responsibility to monitor the
tv22.list postings.  It accepts postings only from the email addresses
used to subscribe to it. An archive of past postings is available using the active
participant's userid/password.

Questions ?

Any administrative questions about conference participation,
application format/content, subscriptions to the tv22.list,
etc. should be sent to george.awad at

Best regards,

TRECVID 2022 organizers team

National Institute of
Standards and Technology HomeDate created: Wednesday, 26-Jan-22
For further information contact