Ad-hoc Video Search (AVS)

Task Coordinators: Georges Quénot and George Awad

The Ad-hoc search task ended a 3 year cycle from 2016-2018 with a goal to model the end user search use-case, who is searching (using textual sentence queries) for segments of video containing persons, objects, activities, locations, etc. and combinations of the former. While the Internet Archive (IACC.3) dataset was adopted between 2016 to 2018, in 2019 a new data collection based on Vimeo Creative Commons (V3C) datset was adopted to support the task for at least 3 more years. In 2020 the AVS task will test systems on a new set of queries in addition to common (fixed) progress query set to measure system progress since 2019

System Task

Given the test collection (V3C1), master shot boundary reference, and set of Ad-hoc queries (approx. 30 queries) released by NIST, return for each query a list of at most 1000 shot IDs from the test collection ranked according to their likelihood of containing the target query. Two evaluation tasks will be supported (main task with new queries annually, and a progress subtask to measure system progress across 2 or 3 years)
New optional component to the task this year is allowing systems to report "why" their system believe a submitted shot may be relevant to the query. This is done by an optional explainability attributes with each submitted shot in the xml run file (timestamp of a frame within the shot and associated bounding box coordinates).

1- Main Task (2019 - 2021):

Systems will be asked to return results for *New unseen* queries annually as follows:
30 New queries in 2019, where all 30 queries will be evaluated and scored
20 New queries in 2020 and a set of 10 "common" queries from the progress subtask, where all 30 queries will be evaluated and scored
20 New queries in 2021, and a set of 10 "common" queries from the progress subtask where all 30 queries will be evaluated and scored

2- Progress Subtask (2019 - 2021):

Systems will be asked to return results for 20 Common (fixed) queries annually from 2019 to 2021 where evaluation schedule will be as follows:
Although systems submitted results for the 20 common queries, no evaluation for them was conducted in 2019. NIST just saved the runs for subsequent years.
NIST will evaluate and score subset of 10 out of the 20 common queries submitted in 2019 AND 2020 to allow comparison of performance between the two years.
NIST will evaluate and score the other 10 subset out of the 20 common queries submitted in 2019, 2020 AND 2021 to allow comparison of performance across the three years.

Given the above schedule of query distribution from 2019 to 2021, in total systems plan for results submission is as follows:
50 (30 New + 20 common) queries in 2019 (PAST)
40 (20 New + 20 common) queries in 2020 (Current Year)
40 (20 New + 20 common) queries in 2021

Data Resources

The testing dataset
Vimeo Creative Commons Collection (V3C1) is 7475 videos (1.3 TB, 1000 total hours) with mean video duration of 8 min and total 1,082,659 video segments.
The master shot reference and an associated readme file are available for downloaded.
Speech transcripts for the V3C1 dataset are also available online from this github repository generated using the public Google Cloud Speech-to-Text API.

The development dataset

2016-2018 Internet Archive (IACC.3) dataset (Used by the Ad-hoc Video Search (AVS) task) of 4593 Internet Archive videos (144GB, 600 total hours) using videos with durations between 6.5min and 9.5min. See the information provided here for download instructions and past ground truth.
2013-2015 (Internet Archive IACC.2.A, IACC.2.B, and IACC.2.C datasets) used by the Semantic Indexing task each containing about 200 h drawn from the IACC.2 collection using videos with durations ranging from 10s to 6.4 min. These datasets can be downloaded; see the information provided here
2010-2012 (Internet Archive IACC.1.tv10.training, IACC.1.A, IACC.1.B, and IACC.1.C datasets) used by the Semantic Indexing task each containing about 200 h drawn from the IACC.1 collection using videos with durations ranging from 10s to just longer than 3.5 min. These datasets can be downloaded; see the information provided here
Examples of previous years' Ad-hoc queries are available:
2016 Queries
2017 Queries
2018 Queries
2019 Queries
All past years' ground truth data are available from our past data webpage

Previous collaborative annotations:
The results of past collaborative annotations on Sound and Vision as well as Internet Archive videos from 2007-2013 are available for use in system development.
Sharing of components:
- The common organization, exchange formats and associated tools will be proposed for the sharing of elements among the interested TRECVID Ad-hoc participants. More information will be made available on a dedicated wiki, which will be accessible to TRECVID active participants.
- The Centre for Research and Technology Hellas (ITI-CERTH) team shared their concept detection scores for the IACC.3 dataset
- Frame-level CNN features for IACC.3 dataset are available by RUCMM team here
- A set of classification results, features and high-level analysis of the V3C1 dataset were generated and provided by Klagenfurt University and University of Basel.

Participation types

There will be 3 types of participation:

Participation by submitting only automatic runs for TRECVID evaluation
Participation by submitting only automatic runs for TRECVID evaluation and by using an interactive system during the next VBS (Video Browser Showdown)
Participation by using only interactive system during the next VBS (Video Browser Showdown)

Important for VBS participants:

The same data V3C1 will be used by VBS participants. While VBS supports two kind of tasks: Known-item search and Ad-hoc search, participation in any of the two tasks is optional and teams may choose to join both tasks. A set of 10 common queries have been identified and will be tested at TRECVID and VBS to compare automatic systems, interactive systems and automatic vs interactive on the fixed query set across 2 or 3 years to measure system progress.

For questions about participation in the next VBS please contact the VBS organizers: Werner Bailer, Cathal Gurrin, or Klaus Schoeffmann.

Allowed training categories:

The task supports experiments using a no annotation condition. The idea is to promote the development of methods that permit the indexing of concepts in video shots using only data from the Web or archives without the need of additional annotations. The training data could for instance consist of images or videos retrieved by a general purpose search engine (e.g. Google) using only the query definition with only automatic processing of the returned results.
By "no annotation", we mean here that no annotation should be manually done on the retrieved samples (either images or videos). Any annotation done by somebody else prior to the general search does not count. Methods developed in this context could be used for building indexing tools for any concept starting only from a simple query defined for it. This will be implemented by using additional categories (E and F) for the training types besides the A and D ones.

P l e a s e n o t e t h e s e r e s t r i c t i o n s and this information on training types.

Run submission types:

Three main submission types will be accepted:

Fully automatic (F) runs (no human input in the loop): System takes official query as input and produced result without any human intervention.
Manually-assisted (M) runs: where a human can formulate the initial query based on topic and query interface, not on knowledge of collection or search results. Then system takes the formulated query as input and produces result without further human intervention.
Relevance-Feedback (R) runs: System takes the official query as input and produce initial results, then a human judge can assess the top-30 results and input this information as a feedback to the system to produce a new set of results. This feedback loop is strictly permitted ONLY up to 3 iterations.

An extra 1 (Novelty) run type (N) is allowed to be submitted within the main task. The goal of this run is to encourage systems to submit novel and unique relevant shots not easily discovered by other runs. Each team may submit only 1 novelty run. Please note the new required xml field in the dtd indicating if the run is of novelty or common type

A new optional attributes are available for teams to submit explainability results. This is supported in the xml run files as optional timestamp and bounding box coordinates (please see the XML DTD container for run files) to localize "why" the submitted shot is releavnt to the query. The goal of this initiative is to allow for more detailed diagnostic results for teams and NIST and to assess if submitted results are relevant for the correct reasons. The submission types (automatic, manually-assisted, relevance feedback) are orthogonal to the training types (A, D, E, F).

Each team may submit a maximum of 4 prioritized runs, per submission type and per task type (Main or Progress), with 2 additional if they are of the "no annotation" training type (E or F) and the others are not. The submission formats are described below.

Please note: Only submissions which are valid when checked against the supplied DTDs will be accepted. You must check and correct if needed your submission before submitting it. NIST reserves the right to reject any submission which does not parse correctly against the provided DTD(s). Various checkers exist, e.g., Xerces-J: java sax.Counter -v YourSubmision.xml.

Run submission format:

Participants will submit results against V3C1 data in each run for all and only the 30 main queries or the 20 progress queries (10 of which are shared in the main task) released by NIST and for each query at most 1000 shot IDs.
The DTD includes an xml field to determine the task type (Main or Progress) as well as the run types. Each of the Main and Progress tasks will have their own set of Query IDs to use within run submissions.
Here for download (right click and choose "display page source" to see the entire files) is a DTD for Adhoc search results of one main run, the container for one run, and a small example of what a site would send to NIST for evaluation. Please check all your submissions to see that they are well-formed.
Please submit each of your runs in a separate file, named to make clear which team has produced it. EACH file you submit should begin, as in the example submission, with the DOCTYPE statement:

that refers to the DTD at NIST via a URL and with a videoAdhocSearchResults element even though there is only one run included. Each submitted file must be compressed using just one of the following: gzip, tar, zip, or bzip2.
Remember to use the correct shot IDs in your submissions. The shot IDs take the form of "shotXXXX_YY", where XXXX is the original video ID and YY is the segmented shot ID. Please don't use any keyframe associated file names in your submissions. Please consult the V3C1 readme file for more information about submitting the correct shot file names and the master shot reference for V3C1 dataset
Please note the new *optional" explainability attributes with each submitted shot to indicate a timestamp (in seconds) of a frame within the shot and a bounding box coordinates.
Please submit your runs using this password-protected webpage

Evaluation:

All 2020 main 30 queries (including 10 queries from the progress subtask) will be evaluated by assessors at NIST after pooling and sampling.

For the progress subtask, a set of 10 queries will be evaluated in 2020 (after pooling and sampling using 2019 and 2020 runs), while the other 10 queries will be evaluated in 2021 (using 2019 - 2021 runs).

Please note that NIST uses a number of rules in manual assessment of system output.

Measures:

Mean extended inferred average precision (mean xinfAP), which allows sampling density to vary e.g. so that it can be 100% in the top strata, which are most important for average precision.
As in past years, other detailed measures based on recall, precision will be provided by the sample_eval software.
Speed will also be measured: clock time per query search, reported in seconds (to one decimal place) must be provided in each run.
A special metric will be developed to score Novelty runs such that more credit can be given to unique shots.

Open Issues:

Feedback from participants are needed to finalized the relevance feedback (RF) runs conditions (top X results assessed, X = 30 currently) and number of iterations (currently = 3)
Participants are welcomed to provide feedback about ideas for Concept bank fusion and how teams can collaborate towards that goal.

Digital Video Retrieval at NIST