— TRECVID 2018 guidelines

Ad-hoc Video Search

Task coordinator: Georges Quénot and George Awad

The previous Semantic Indexing task (run from 2010-2015) addressed the problem of automatic assignment of predefined semantic tags representing visual or multimodal concepts to video segments. The new Ad-hoc search task started in 2016 goal is to model the end user search use-case, who is looking for segments of video containing persons,objects,activities,locations, etc. and combinations of the former.

In 2018 the task will again support experiments in the no annotation condition. The idea is to promote the development of methods that permit the indexing of concepts in video shots using only data from the Web or archives without the need of additional annotations. The training data could for instance consist of images or videos retrieved by a general purpose search engine (e.g. Google) using only the query definition with only automatic processing of the returned results. This will not be implemented as a new variant of the task but by using additional categories for the training types besides the A and D ones (see below). By "no annotation", we mean here that no annotation should be manually done on the retrieved samples (either images or videos). Any annotation done by somebody else prior to the general search does not count. Methods developed in this context could be used for building indexing tools for any concept starting only from a simple query defined for it.

System task:

Given the test collection (IACC.3), master shot reference, and set of Ad-hoc queries (approx. 30 queries) released by NIST, return for each query a list of at most 1000 shot IDs from the test collection ranked according to their likelihood of containing the target query.

Data:

The current test data set (IACC.3) is 4593 Internet Archive videos (144GB, 600 total hours) using videos with durations between 6.5min and 9.5min.

The development data set combines the development and test data sets of the:

2010-2012 Semantic Indexing task (IACC.1.tv10.training, IACC.1.A, IACC.1.B, and IACC.1.C) each containing about 200 h drawn from the IACC.1 collection using videos with durations ranging from 10s to just longer than 3.5 min. These datasets can be downloaded; see the information provided here
2013-2015 Semantic Indexing task (IACC.2.A, IACC.2.B, and IACC.2.C) each containing about 200 h drawn from the IACC.2 collection using videos with durations ranging from 10s to 6.4 min. These datasets can be downloaded; see the information provided here

Examples of previous Ad-hoc queries (used in 2016 - 2017) can be found here and here

Collaborative annotation:

collaborative annotations

Sharing of components:

The common organization, exchange formats and associated tools developed within the IRIM consortium for the joint participation of its members will be proposed for the sharing of elements among the interested TRECVID Ad-hoc participants. More information will be made available on a dedicated wiki, which will be accessible to TRECVID active participants.
The Centre for Research and Technology Hellas (ITI-CERTH) team shared their concept detection scores for the IACC.3 dataset

Participation types:

There will be 3 types of participation:

Participation by submitting only automatic runs for TRECVID evaluation
Participation by submitting only automatic runs for TRECVID evaluation and by using an interactive system during the next VBS (Video Browser Showdown)
Participation by using only interactive system during the next VBS (Video Browser Showdown)

Important for VBS participants:

The same data IACC.3 will be used by VBS participants. While VBS supports two kind of tasks: Known-item search and Ad-hoc search, participation in any of the two tasks is optional and teams may choose to join both tasks. Interactive systems at VBS joining the Ad-hoc task will be tested real-time on a subset of random selected queries (from the 30 selected for TRECVID 2018). For questions about participation in the next VBS please contact the VBS organizers: Werner Bailer, Cathal Gurrin, or Klaus Schoeffmann.

Training types:

P l e a s e n o t e t h e s e r e s t r i c t i o n s and this information on training types. The submission types (automatic and manually-assisted) are orthogonal to the training types (A, D, E, F ...).

Submissions:

Three main submission types will be accepted:

Fully automatic (F) runs (no human input in the loop): System takes a query as input and produced result without any human intervention.
Manually-assisted (M) runs: where a human can formulate the initial query based on topic and query interface, not on knowledge of collection or search results. Then system takes the formulated query as input and produces result without further human intervention.
Relevance-Feedback (R) runs: System takes the official query as input and produce initial results, then a human judge can assess the top-5 results and input this information as a feedback to the system to produce a final set of results. This feedback loop is strictly permitted only once.

Each team may submit a maximum of 4 prioritized runs, per submission type, with 2 additional if they are of the "no annotation" training type and the others are not. The submission formats are described below.

Please note: Only submissions which are valid when checked against the supplied DTDs will be accepted. You must check your submission before submitting it. NIST reserves the right to reject any submission which does not parse correctly against the provided DTD(s). Various checkers exist, e.g., Xerces-J: java sax.SAXCount -v YourSubmision.xml.

Participants will submit real results against IACC.3 data in each run for all and only the 30 Ad-hoc queries released by NIST and for each query at most 1000 shot IDs.

Here for download (right click and choose "display page source" to see the entire files) is a DTD for Adhoc search results of one main run, the container for one run, and a small example of what a site would send to NIST for evaluation. Please check all your submissions to see that they are well-formed.

Please submit each of your runs in a separate file, named to make clear which team has produced it. EACH file you submit should begin, as in the example submission, with the DOCTYPE statement:

<!DOCTYPE videoAdhocSearchResults SYSTEM "https://www-nlpir.nist.gov/projects/tv2018/dtds/videoAdhocSearchResults.dtd">

that refers to the DTD at NIST via a URL and with a videoAdhocSearchResults element even though there is only one run included. Each submitted file must be compressed using just one of the following: gzip, tar, zip, or bzip2.

Remember to use the correct shot IDs in your submissions - the ones from video segment elements the mp7 files in the master shot reference with the format shotFILENUMBER_SHOTNUMBER. Do not use the ID associated with the video (TRECVID2016_FILENUMBER) or the one associated with a keyframe (shotFILENUMBER_SHOTNUMBER_RKF).

Submissions will be transmitted to NIST via a password-protected webpage.

Evaluation:

All queries (approx. 30) will be evaluated by assessors at NIST after pooling and sampling.

Please note that NIST uses a number of rules in manual assessment of system output.

Measures:

Mean extended inferred average precision (mean xinfAP), which allows sampling density to vary e.g. so that it can be 100% in the top strata, which are most important for average precision.

As in past years, other detailed measures based on recall, precision will be provided by the sample_eval software.

Speed will also be measured: clock time per query search, reported in seconds (to one decimal place).

Issues:

None