Welcome to TRECVID 2022!

The main goal of the TREC Video Retrieval Evaluation (TRECVID) is to promote progress in content-based analysis of and retrieval from digital video via open, metrics-based evaluation. TRECVID is a laboratory-style evaluation that attempts to model real world situations or significant component tasks involved in such situations.

Previous TRECVID evaluations

Up until 2010, TRECVID used test data from a small number of known professional sources - broadcast news organizations, TV program producers, and surveillance systems - that imposed limits on program style, content, production qualities, language, etc. For example, from 2003 - 2006 TRECVID supported experiments in automatic segmentation, indexing, and content-based retrieval of digital video using broadcast news in English, Arabic, and Chinese. Also, between 2007 to 2009 TRECVID provided participants with cultural, news magazine, documentary, and education programming supplied by the Netherlands Institute for Sound and Vision. Tasks using this video included segmentation, search, feature extraction, and copy detection. Surveillance event detection was evaluated using airport surveillance video provided by the UK Home Office.

In 2010 TRECVID confronted known-item search and semantic indexing systems with a new set of Internet videos (referred to in what follows as IACC) characterized by a high degree of diversity in creator, content, style, production qualities, original collection device/encoding, language, etc - as is common in much "Web video". The collection also has associated keywords and descriptions provided by the video donor. The videos are available under Creative Commons licenses from the Internet Archive. The only selection criteria imposed by TRECVID beyond the Creative Commons licensing is one of video duration - they are short (less than 6 min). In addition to the IACC data set, NIST began developing an Internet multimedia test collection (HAVIC) with the Linguistic Data Consortium and used it in growing amounts (up to 8000 h) in TRECVID 2010-2017 Multimedia Event Detection (MED) task. The airport surveillance video, introduced in TRECVID 2009, has been reused each year up to 2017 within the Surveillance event detection (SED) task.

New in 2013 was video provided by the BBC. Programming from their long-running EastEnders series was used in the instance search (INS) task. An additional 600 h of Internet Archive video available under Creative Commons licensing for research (IACC.2) was used for the semantic indexing task as planned from 2013 to 2015 with new test data each year. In addition, a new concept localization (LOC) task was introduced in 2013 up to 2016.

In 2015 a new Video Hyperlinking task (LNK) previously run in MediaEval was added up to 2017 and updated in 2018 to address social media storytelling linking.

From 2016 to 2018 the Ad-hoc Video Search (AVS) task adopted a new IACC.3 dataset (600 hr) and introduced the V3C (1000 hr of Vimeo videos) in 2019, while a new pilot "Video to Text" (VTT) description task was introduced in 2016 to address matching and describing videos using textual descriptions.

A new video activity detection (ActEV) task was introduced in 2018 as an extension to the SED task. Finally, in 2020 two new tasks were introduced: The video summarization (VSUM) using the BBC Eastenders dataset and the Disaster Scene Description and Indexing (DSDI) task to tackle recognizing and indexing visual concept features highly correlated with natural disaster airborne video footage from real world disaster events.

Many resources created by NIST and the TRECVID community are available for continued research on past datasets independent of TRECVID. See the Datasets and Resources section of the TRECVID website for pointers.

TRECVID 2022 Tasks

In TRECVID 2022, 4 tasks (AVS, VTT, ActEV, and DSDI) will contiue with some revisions, and 2 new tasks will start (MSUM and DVU).

Ad-hoc Video Search (AVS) [Retrieve videos from text query]
Activities in Extended Video (ActEv) [Detect activities from long surveillance videos]
Deep Video Understanding (DVU) [Answer questions about movies]
Video to Text (VTT) [Provide a description for short videos]
Movie Summarization (MSUM) [Summarize the main events of movie characters]
Disaster Scene Description and Indexing (DSDI) [Classify features in Low Altitude Disaster Imagery]

Digital Video Retrieval at NIST

Digital Video Retrieval at NIST
News magazine, science news, news reports, documentaries, educational programming, and archival video

Digital Video Retrieval at NIST
TV Episodes

Digital Video Retrieval at NIST
Airport Security Cameras & Activity Detection

Digital Video Retrieval at NIST
Video collections from News, Sound & Vision, Internet Archive,
Social Media, BBC Eastenders