— TRECVID 2018 guidelines

Instance Search

Task coordinator: George Awad and Wessel Kraaij

An important need in many situations involving video collections (archive video search/reuse, personal video organization/search, surveillance, law enforcement, protection of brand/logo use) is to find more video segments of a certain specific person, object, or place, given a visual example. From 2010-2015 the instance search task has tested systems on retrieving specific instances of objects, persons and locations. In 2016-2017 , a new query type was introduced and asked systems to retrieve specific persons in specific locations.

In 2018 NIST will create about 30 topics, of which the first 21 will be used for interactive systems. The task will again use the EastEnders data, prepared with major help from several participants in the AXES project (Access to Audiovisual Archives), a four-year FP7 framework research project to develop tools that provide various types of users with new and engaging ways to interact with audiovisual libraries.

System task:

Given a collection of test videos, a master shot reference, a set of known location/scene example images and videos, and a collection of topics (queries) that delimit a person in some example images and videos, locate for each topic up to the 1000 shots most likely to contain a recognizable instance of the person in one of the known locations. Interactive runs are welcome and will likely return many fewer than 1000 shots. The development of fast AND effective search methods is encouraged.

Data:

Development data: A very small sample (File ID=0) of the BBC Eastenders test data will be available from Dublin City University. No actual development data will be supplied. File 0 is therefore NOT part of the test data and no shots from File 0 should be part of any submission.
Test data: The test data for 2018 will be BBC EastEnders video in MPEG-4 format. See here for information on how to get a copy of the test data.
Topics: Each topic will consist of a set of 4 example frame images (bmp) drawn from test videos containing the person of interest in a variety of different appearances to the extent possible in addition to the name of one location. Example images/videos for the set of master locations will be given to participants as well.
For each frame image there will be a binary mask of the region of interest (ROI), as bounded by a single polygon and the ID from the master shot reference of the shot from which the image example was taken. In creating the masks (in place of a real searcher), we will assume the searcher wants to keep the process simple. So, the ROI may contain non-target pixels, e.g., non-target regions visible through the target or occluding regions. In addition to example images of the person of interest, the shot videos from which the images were taken will also be given as video examples. The shots from which example images are drawn for a given topic, will be filtered by NIST from system submissions for that topic before evaluation.
Here is an example of a set of topics and here is a pointer to the DTD for an instance search topic (you may need to right click on "view source").
Auxiliary data: Participants are allowed to use various publicly available EastEnders resources as long as they carefully note the use of each such resource by name in their workshop notebook papers. They are strongly encouraged to share information about the existence of such resources with other participants via the tv18.list as soon as they discover them.

Submissions:

We will allow teams to submit multiple runs (to be counted only as one against the maximum allowed) as long as those runs differ only in what set of training examples for a topic are used. The sets will be defined as follows (in the DTD):

A - One or more provided images - no video
E - Video examples (+ optionally image examples)

Each run will also be required to state the source of training data used from the following options (in the DTD):

A- Only sample video 0
B- Other external data
C- Only provided images/videos in the query
D- Sample video 0 AND provided images/videos in the query (A+C)
E- External data AND NIST provided data (sample video 0 OR query images/videos)

Each team may submit a maximum of 4 prioritized runs per training example set (note the example set exception mentioned above allowing up to 8 runs in one specific case). All runs will be evaluated but not all may be included in the pools for judgment.

Submissions will be identified as either fully automatic or interactive. Interactive runs will be limited to 5 elapsed minutes per search and 1 user per system run.

Please note: Only submissions which are valid when checked against the supplied DTDs will be accepted. You must check your submission before submitting it. NIST reserves the right to reject any submission which does not parse correctly against the provided DTD(s). Various checkers exist, e.g., Xerces-J: java sax.SAXCount -v YourSubmision.xml.

Here for download (right click and choose "display page source" to see the entire file) is the DTD for search results of one run, the container for one run, and a small example of what a site would send to NIST for evaluation. Please check your submission to see that it is well-formed

Please submit each run in a separate file, named to make clear which team it is from. EACH file you submit should begin, as in the example submission, with the DOCTYPE statement and a videoSearchResults element even if only one run is included:

<!DOCTYPE videoSearchResults SYSTEM "https://www-nlpir.nist.gov/projects/tv2018/dtds/videoSearchResults.dtd">

Submissions will be transmitted to NIST via a password-protected webpage .

Evaluation:

This task will be treated as a form of search and will accordingly be evaluated with average precision for each topic in each run and per-run mean average precision over all topics. Speed will also be measured: clock time per topic search, reported in seconds (to one decimal place).

Important notes

The BBC requires allINS task participants fill, sign and submit a renewal data License agreement in order to use the Eastenders data. That means that even if a past participant has a copy of the data, the team must submit a renewal License form before any submission runs can be accepted and evaluated.

No usage of previous year's ground truth is allowed in order to filter the current year's search results.

No human preknowledge to the closed world of the Eastenders dataset is allowed to be used to filter search results. Any filteration methods should all be automatic without fine tuning based on the Eastenders dataset human knowledge.

No manual intervention is allowed to modify testing topics example images. Only automatic methods are allowed.

The usage of the included xml transcripts' files are limited to only the transcripted text and not to any other metadata (or xml) attributes (e.g. color of text, etc).

Interactive systems essentially use humans to filter or rerank search results, but not to modify testing topics in a preprocessing step.

Issues:

Should a new "Manual" run category be added in addition to automatic and interactive ?