Instance Search (INS)

Task Coordinators: Wessel Kraaij and Keith Curtis

An important need in many situations involving video collections (archive video search/reuse, personal video organization/search, surveillance, law enforcement, protection of brand/logo use) is to find more video segments of a certain specific person, object, or place, given a visual example. From 2010-2015 the instance search task has tested systems on retrieving specific instances of objects, persons and locations. Recently in 2016-2018, a new query type was introduced and asked systems to retrieve specific persons in specific locations.

In 2019, a *new query type* was introduced to ask systems to retrieve specific persons doing specific actions. The task supports fully automatic as well as interactive systems with timed humans in the loop. The task will again use the EastEnders data with support from the British Broadcasting Corporation (BBC).

System Task

Given a collection of BBC Eastenders test videos, a master shot boundary reference, a collection of topics (queries) that delimit a person in some example images and videos, and a set of predefined actions with example images and/or videos, locate for each topic up to 1000 shots most likely to contain a recognizable instance of the person doing one of the predefined actions. The set of actions defines daily life atomic interactions between people and objects (e.g. person opening a door), people with other people (e.g. person shaking hands), or just people only (e.g. person shouting). The development of fast AND effective search methods is encouraged.
New optional component to the task this year is allowing systems to report "why" their system believe a submitted shot may be relevant to the query. This is done by an optional explainability attributes with each submitted shot in the xml run file (timestamp of a frame within the shot and associated bounding box coordinates).

This year a continuation of the two evaluation tasks started in 2019 will follow:

1- Main Task:

30 queries including 20 New queries in 2020 and a subset of 10 queries from the progress subtask, where all 30 queries will be evaluated and scored
30 queries including 20 New queries in 2021 and a subset of 10 queries from the progress subtask, where all 30 queries will be evaluated and scored

2- Progress Subtask:

Systems will be asked to return results for 20 Common (fixed) queries annually from 2019 to 2021 where evaluation schedule will be as follows:
Although systems submitted results for the 20 common queries in 2019, no evaluation for them was conducted in 2019. NIST just saved the runs for subsequent years.
NIST will evaluate and score subset of 10 out of the 20 common queries submitted in 2019 AND 2020 to allow comparison of performance between the two years.
NIST will evaluate and score the other 10 subset out of the 20 common queries submitted in 2019, 2020 AND 2021 to allow comparison of performance across the three years.

Given the above schedule of query distribution from 2019 to 2021, in total systems should submit results for:
50 (30 New + 20 common) queries in 2019
40 (20 New + 20 common) queries in 2020
40 (20 New + 20 common) queries in 2021

Data Resources

The testing dataset
About 244 video files (300 GB, 464 h) of BBC EastEnders video in MPEG-4 format. See here for information on how to get a copy of the test data. Transcript files (English) of the videos can be accessed from DCU and also as well from the active participants area here.

The development dataset

A very small sample (File ID=0) of the BBC Eastenders test data will be available from Dublin City University. No actual development data will be supplied. File 0 is therefore NOT part of the test data and no shots from File 0 should be part of any submission.
Auxiliary data: Participants are allowed to use various publicly available EastEnders resources as long as they carefully note the use of each such resource by name in their workshop notebook papers. They are strongly encouraged to share information about the existence of such resources with other participants via the active participants mailing list as soon as they discover them.
Teams are responsible and encouraged to collect their own action development data for training purposes. The provided action examples by NIST are only meant to demonstrate the type of action required in the query. As the examples can not include all possible variations of action appearance, there will be a textual definition to define the scope of the required action.
A list of 25 actions from which a subset will be used for evaluation, in the final testing topics, together with a definitions (description) of what is relevant in the scope of those actions is available here.
Sample of action video examples with a readme file explaining the mapping between each action and it's corresponding video examples are available here. Those are only available for current year active participants. Please use the username/password of the active participants area to access the data.

Topics (Queries):
Each topic will consist of a set of 4 example frame images (bmp) drawn from test videos containing the person of interest in a variety of different appearances to the extent possible in addition to the name of one action. Example images/videos for the set of actions will be given to participants as well. Along with the action examples, there will be a textual definition of the action to explain the scope of the true positive action and what can or can not be included.
For each frame image (of a target person) there will be a binary mask of the region of interest (ROI), as bounded by a single polygon and the ID from the master shot reference of the shot from which the image example was taken. In creating the masks (in place of a real searcher), we will assume the searcher wants to keep the process simple. So, the ROI may contain non-target pixels, e.g., non-target regions visible through the target or occluding regions. In addition to example images of the person of interest, the shot videos from which the images were taken will also be given as video examples. The shots from which example images are drawn for a given topic, will be filtered by NIST from system submissions for that topic before evaluation if this shot satisfies the query (target person doing the required action).
Here is an example of a set of topics and here is a pointer to the DTD for an instance search topic (you may need to right click on "view source").
Sharing of components:
- Docker image tools for development are available here. Contact the author Robert Manthey if you have questions using them.
- We encourage teams to share development resources with other active participants to expedite system development.

Run submission types

There will be 2 types of runs that teams can submit for evaluation:

Fully automatic (F) runs: System takes official query/topic as input and produced results without any human intervention.
Interactive (I) runs: (humans in the loop) - System takes official query/topic as input and produce results where humans can filter or rerank search results for up to a period of 5 elapsed minutes per search and 1 user per system run.

In the above both run types, all provided official query image/video examples should be frozen with no human modifications to them.

Allowed training data source and categories:

Each run must report the type of training data used for person recognition from the following two categories: (see the DTD):

"A" - Training data used one or more provided query official images only - no video.
"E" - Training data used video examples (+ optionally image examples)

Each run will also be required to state the source of training data used from the following options: (see the DTD):

A- Only sample video 0
B- Other external data only
C- Only provided images/videos in the official query
D- Sample video 0 AND provided images/videos in the official query (A+C)
E- External data AND NIST provided data (sample video 0 OR official query images/videos)

Run submission format:

Participants will submit results against BBC Eastenders dataset in each run for all and only the 20 main queries or the 20 progress queries released by NIST and for each query at most 1000 shot IDs.
Please note the new optional explainability attributes in the xml DTD in the form of a timestamp of a frame within the submitted shot and a bounding box indicating "why" the system believes the submitted shot is relevant to the query
Each team may submit a maximum of 4 prioritized runs per submission type ("F" or "I") and per training type category ("A" or "E") allowing up to 8 runs in one specific case (submitting 4 "A" runs and 4 "E" runs) for an automatic (F) or interactive (I) system. All runs will be evaluated but not all may be included in the pools for judgment.
Please note: Only submissions which are valid when checked against the supplied DTDs will be accepted. You must check your submission before submitting it. NIST reserves the right to reject any submission which does not parse correctly against the provided DTD(s). Various checkers exist, e.g., Xerces-J: java sax.Counter -v YourSubmision.xml.
Here for download (right click and choose "display page source" to see the entire file) is the DTD for search results of one run, the container for one run, and a small example of what a site would send to NIST for evaluation. Please check your submission to see that it is well-formed
Please submit each run in a separate file, named to make clear which team it is from. EACH file you submit should begin, as in the example submission, with the DOCTYPE statement and a videoSearchResults element even if only one run is included:
Submissions will be transmitted to NIST using this password-protected webpage

Evaluation:

All 2020 30 queries (20 new queries + 10 progress queries) will be evaluated by assessors at NIST after pooling and sampling.
In 2020, a subset of 10 progress queries will be evaluated (using 2019 and 2020 runs), while the other 10 queries will be evaluated in 2021 (using 2019 - 2021 runs).
NIST won't be able to assess the optional explainability results but the goal for this initiative is to serve as a diagnostic tool for systems and NIST to assess if the submitetd shots are relevant for the correct reasons.
Please note that NIST uses a number of rules in manual assessment of system output.

Measures:

This task will be treated as a form of search and will accordingly be evaluated with average precision for each topic in each run and per-run mean average precision over all topics.
As in past years, other detailed measures based on recall, precision will be provided by the trec_eval_video software.
Speed will also be measured: clock time per query search, reported in seconds (to one decimal place) must be provided in each run.

Important notes

The BBC requires all INS task participants to fill, sign and submit a renewal data License agreement in order to use the Eastenders data. That means that even if a past participant has a copy of the data, the team must submit a renewal License form before any submission runs can be accepted and evaluated.
No usage of previous year's ground truth is allowed in order to filter the current year's search results.
No human preknowledge to the closed world of the Eastenders dataset is allowed to be used to filter search results. Any filteration methods should all be automatic without fine tuning based on the Eastenders dataset human knowledge.
No manual intervention is allowed to modify testing topics example images. Only automatic methods are allowed.
The usage of the included xml transcripts' files are limited to only the transcripted text and not to any other metadata (or xml) attributes (e.g. color of text, etc).
Interactive systems essentially use humans to filter or rerank search results, but not to modify testing topics in a preprocessing step.

Open Issues:

BBC Eastenders data License is still being coordinated with the BBC. All active participants will be informed when it is ready in order to submit a signed data agreement and download the data.

Digital Video Retrieval at NIST