TRECVID 2015 Guidelines

Guidelines for TRECVID 2015

(last updated: )

0. Table of Contents:

Introduction
Video data
Data use agreements handled by NIST
Task - Semantic indexing (SIN)
Task - Surveillance event detection (SED)
Task - Instance search (INS)
Task - Multimedia event detection (MED)
Task - Localization (LOC)
Task - Video Hyperlinking (LNK)
Schedule
Outstanding 2015 guideline work items
Active participant's area
Contacts, email lists

1. Introduction:

The main goal of the TREC Video Retrieval Evaluation (TRECVID) is to promote progress in content-based analysis of and retrieval from digital video via open, metrics-based evaluation. TRECVID is a laboratory-style evaluation that attempts to model real world situations or significant component tasks involved in such situations.

Then ...

Up until 2010, TRECVID used test data from a small number of known professional sources - broadcast news organizations, TV program producers, and surveillance systems - that imposed limits on program style, content, production qualities, language, etc. In 2003 - 2006 TRECVID supported experiments in automatic segmentation, indexing, and content-based retrieval of digital video using broadcast news in English, Arabic, and Chinese. TRECVID also completed two years of pilot studies on exploitation of unedited video rushes provided by the BBC. In 2007 - 2009 TRECVID provided participants with cultural, news magazine, documentary, and education programming supplied by the Netherlands Institute for Sound and Vision. Tasks using this video included segmentation, search, feature extraction, and copy detection. Systems were tested in rushes video summarization using the BBC rushes. Surveillance event detection was evaluated using airport surveillance video provided by the UK Home Office. Many resources created by NIST and the TRECVID community are available for continued research on this data independent of TRECVID. See the Past data section of the TRECVID website for pointers.

In 2010 TRECVID confronted known-item search and semantic indexing systems with a new set of Internet videos (referred to in what follows as IACC) characterized by a high degree of diversity in creator, content, style, production qualities, original collection device/encoding, language, etc - as is common in much "Web video". The collection also has associated keywords and descriptions provided by the video donor. The videos are available under Creative Commons licenses from the Internet Archive. The only selection criteria imposed by TRECVID beyond the Creative Commons licensing is one of video duration - they are short (less than 6 min). In addition to the IACC data set, NIST began developing an Internet multimedia test collection (HAVIC) with the Linguistic Data Consortium and used it in growing amounts (up to 8000 h) in TRECVID 2010-present. The airport surveillance video, introduced in TRECVID 2009, has been reused each year since.

New in 2013 was video provided by the BBC. Programming from their long-running EastEnders series was used in the instance search task. An additional 600 h of Internet Archive video available under Creative Commons licensing for research (IACC.2) was used for the semantic indexing task. As planned, the 2013 tasks were continued in 2014 with some new test data.

Now ...

In TRECVID 2015 NIST will continue 4 of the 2014 tasks with some revisions (SIN, INS, MED, SED), drop one (MER), separate out the localization task (LOC) from semantic indexing, and add a new Video Hyperlinking task (LNK) previously run in MediaEval:

Semantic indexing [IACC]
Surveillance event detection [i-LIDS]
Instance search [BBC EastEnders]
Multimedia event detection [HAVIC]
Localization [IACC]
Video Hyperlinking [BBC for Hyperlinking]

2. Video data:

A number of datasets are available for use in TRECVID 2015 and are described below.

Once you know which tasks you will be participating in, you can determine which data sets you need.
Then for each needed dataset, see below for information on how you get permission to use the data and how it will be distributed..

IACC.2.A-C

Three datasets (A,B,C) - totaling approximately 7300 Internet Archive videos (144 GB, 600 h) with Creative Commons licenses in MPEG-4/H.264 format with duration ranging from 10 s to 6.4 min and a mean duration of almost 5 min. Most videos will have some metadata provided by the donor available e.g., title, keywords, and description.

NOTE: Be sure to reload the relevant collection.xml files (A, B, C) in the master shot reference and remove files with a "use" attribute set to "dropped" - these are no longer available under a Creative Commons license and are not part of the test collection.

Data use agreements and Distribution: Download for active participants from NIST/mirror servers. See Data use agreements

Master shot reference: Will be available to active participants by download from the active participant's area of the TRECVID website.

Master I-Frames for Localization: Will be extracted by NIST using ffmpeg for all videos in the IACC.2.C collection and made available to active participants by download (circa 18 GB) from the active participant's area of the TRECVID website.

Automatic speech recognition (for English): Will be available to active participants by download from the active participant's area of the TRECVID website.

IACC.1.A-C

Three datasets (A,B,C) - totaling approximately 8000 Internet Archive videos (160 GB, 600 h) with Creative Commons licenses in MPEG-4/H.264 format with duration between 10s and 3.5 min. Most videos will have some metadata provided by the donor available e.g., title, keywords, and description

Data use agreements and Distribution: Available by download from the Internet Archive. See TRECVID Past Data page. Or download from the copy on the Dublin City University server, but use the collection.xml files (see TRECVID past data page) for instructions on how to check the current availability of each file.

Master shot reference: Available by download from the TRECVID Past Data page

Automatic speech recognition (for English): Available by download from the TRECVID Past Data page

IACC.1.tv10.training

Approximately 3200 Internet Archive videos (50 GB, 200 h) with Creative Commons licenses in MPEG-4/H.264 format with durations between 3.6 and 4.1 min Most videos will have some metadata provided by the donor available e.g., title, keywords, and description

Data use agreements and Distribution: Available by download from the Internet Archive. See TRECVID Past Data page. Or download from the copy (see tv2010 directory) on the Dublin City University server, but use the collection.xml files (see TRECVID past data page) for instructions on how to check the current availability of each file.

Master shot reference: Available by download from the TRECVID Past Data page

Common feature annotation: Available by download from the TRECVID Past Data page

Automatic speech recognition (for English): Available by download from the TRECVID Past Data page

Gatwick and i-LIDS MCT airport surveillance video

The data consist of about 150 h of airport surveillance video data (courtesy of the UK Home Office). The Linguistic Data Consortium has provided event annotations for the entire corpus. The corpus was divided into development and evaluation subsets. Annotations for 2008 development and test sets are available.

Data use agreements and Distribution:

Gatwick development data (2008 DevSet and 2008 EvalSet) by download from password-protected servers at NIST and mirror sites. See Data use agreements
2009 i-LIDS test data from United Kingdom's Centre for Applied Science and Technology (CAST) can be downloaded from NIST but only after CAST has received the required information and issued a userid/password. See here for details.

Development data annotations: available by download.

BBC EastEnders

Approximately 244 video files (totally 300 GB, 464 h) with associated metadata, each containing a week's worth of BBC EastEnders programs in MPEG-4/H.264 format.

Data use agreements and Distribution: Download and fill out the data permission agreement from the active participants' area of the TRECVID website. After the agreement has been processed by NIST and the BBC, the applicant will be contacted by Dublin City University with instructions on how to download from their servers. See Data use agreements

Master shot reference: Will be available to active participants by download from the TRECVID 2015 active participant's area.

Automatic speech recognition (for English): Will be available to active participants by download from Dublin City University.

BBC for Video Hyperlinking

For the video dataset, we have agreed with the BBC to use a dataset that will comprise between 2500-3500 hours of BBC video content. The data will be accompanied by archival metadata (e.g., subtitles, short program descriptions, list of popular UK celebrities) and automatic annotations (e.g., speech transcripts, shot segmentation, face detection, different versions of concept detectors). The metadata will be available under a nondisclosure type of agreement. Additionally, task participants are welcome to use (and share) other metadata.

Here are example files for system development:

Here is the set of test anchors.

Data use agreements and Distribution:

Video and ASR will be available from a server at University of Twente in the Netherlands. Please download the appropriate permission forms from the active participant's area and follow the instructions at the top of each form to receive the download information.

HAVIC

HAVIC is a large collection of Internet multimedia constructed by the Linguistic Data Consortium and NIST. Participants will receive training corpora, event training resources, and two development test collections. Participants will also receive an evaluation collection which is the same as last year; either of:

~ 8,000 hour MED14 search collection (used for the evaluation) or
~1,300 hour MED14 search subset for participants with limited computing resources (used for the evaluation)

Data use agreements and Distribution: Data licensing and distribution will be handled by the Linguistic Data Consortium. The MED'15 website is up and operational. Currently, only the data license agreement is on the site. All teams (even pastparticipants) must submit a license agreement to the LDC.

3. Data use agreements handled by NIST (Gatwick (2008), IACC.2, BBC EastEnders)

In order to be eligible to receive the data, you must have have applied for participation in TRECVID. Your application will be acknowledged by NIST with a team ID, and active participant's password, and information about how to obtain the data.

If you will be using i-LIDS (2009) HAVIC data, or BBC for Hyperlinking, NIST will NOT be handling the data use agreements. See the "Data Use Agreements and Distribution" section for i-LIDS, HAVIC, or BBC for Hyperlinking.

If you will be using IACC.1 video, the data use agreements are available from the "Past data" webpage. You will be downloading the data from the Dublin City University server (see above) or the Internet Archive. See the "Data Use Agreements and Distribution" section for IACC.1

If you will be needing to get a copy of Gatwick(2008), IACC.2, or BBC EastEnders data you will need to complete the relevant permission forms (from the active participant's area) and email the scanned page images for each form as one Adobe Acrobat pdf of the document to Angela Ellis.
Note that all of the IACC.2 and EastEnders data was made available last year. So if you signed the permission form last year and do not need to replace your original copy then you do not need to submit another permission form this year.
In your email include the following:
```
As Subject: "TRECVID data request"
In the body: your name
             your short team ID (given when you applied to participate)
             the kinds of data you will be using - one or more of the following:
	     Gatwick (2008), IACC.2, and/or BBC EastEnders 
```
You will receive instructions on how to download the data.

Please ask only for the test data (and optional development data) required for the task(s) you apply to participate in and intend to complete.

Requests are handled in the order they are received. Please allow 5 business days for NIST to respond to your request.for Gatwick or IACC.2 data with the access codes you need to download the data using the information about data servers in the the active participant's area. Requests for the EastEnders data are forwarded within 5 business days to the BBC and from there to DCU, who will contact you with the download information. This process may take up to 3 weeks.

4. Semantic indexing:

Task coordinator: Georges Quénot

This task will be coordinated by Georges Quénot from the Laboratoire d'Informatique de Grenoble in collaboration with NIST.

Automatic assignment of semantic tags representing visual or multimodal concepts (previously "high-level features") to video segments can be fundamental technology for filtering, categorization, browsing, search, and other video exploitation. New technical issues to be addressed include methods needed/possible as collection size and diversity increase, when the number of concepts increases, and when concepts are related by an ontology.

In 2015 the task will again support experiments in the no annotation condition. The idea is to promote the development of methods that permit the indexing of concepts in video shots using only data from the Web or archives without the need of additional annotations. The training data could for instance consist of images retrieved by a general purpose search engine (e.g. Google) using only the concept name and/or definition with only automatic processing of the returned images. This will not be implemented as a new variant of the task but by using additional categories for the training types besides the A to D ones (see below). By "no annotation", we mean here that no annotation should be manually done on the retrieved samples (either images or videos). Any annotation done by somebody else prior to the general search does not count. Methods developed in this context could be used for building indexing tools for any concept starting only from a name and a definition for it or from a simple query defined for it.

System task:

Given the test collection (IACC.2.C), master shot reference, and concept definitions, return for each target concept a list of at most 2000 shot IDs from the test collection ranked according to their likelihood of containing the target.

Data:

The current test data set (IACC.2.C) is 200 h drawn from the IACC.2 collection using videos with durations between 10 s and 6 min.

The development data set combines the development and test data sets of the 2010 and 2011 issues of the task, IACC.1.tv10.training, IACC.1.A, IACC.1.B, and IACC.1.C each containing about 200 h drawn from the IACC.1 collection using videos with durations ranging from 10s to just longer than 3.5 min. These datasets can be downloaded; see the information provided above.

Concepts and relations:

500 concepts were selected for the TRECVID 2011 semantic indexing task. In making this selection, the organizers drew from the 130 used in TRECVID 2010, the 374 selected by CU/Vireo for which there exist annotations on TRECVID 2005 data, and some from the LSCOM ontology. From these 500 concepts, 346 concepts were selected for the full task in 2011 as those for which there exist at least 4 positive samples in the final annotation. A spreadsheet of the concepts is available here with complete definitions and an alignment with CU-VIREO374 where appropriate. [Don't be confused by the multiple numberings in the spreadsheet - use the TV13-15 IDs in the concept lists below under "Submissions".] For 2015 the same list of 500 concepts has been used as a starting point for selecting the 60 single concepts for which participants must submit results.

The organizers have provided again a set of relations between the concepts. There are two types of relations: A implies B and A excludes B. Relations that can be derived by transitivity will not be included. Participants are free to use the relations or not and submissions are not required to comply with them.

It is expected that advanced methods will use the annotations of non-evaluated concepts and the ontology relations to improve the detection of the evaluated concepts. The use of the additional annotations and of ontology relations is optional and comparison between methods that use them and methods that do not is encouraged.

Collaborative annotation:

collaborative annotations

Common non-annotated training data:

The organizers will try to provide a common list of image URLs retrieved by issuing a query to a general purpose search engine for each target concept so that experiments in this category can be carried out using common non-annotated training data. This should remove the need for participants to do this preliminary work and should ease the comparison between methods by using the same collected data. This does not prevent other participants to do their own gathering of non-annotated training data. Contact Georges.Quenot at imag.fr for further information.

Sharing of components:

wiki

Submission types:

There will be only one type of submission:

"main" in which participants will be ready to provide a result for these 60 single concepts drawn from the 346 used in TRECVID 2012

NOTE: Participants who submitted Progress runs in 2013 and/or 2014 against the IACC.2.C data will have their 2015 submissions compared by NIST to their earlier ones. Here are the progress runs submitted in 2013 and/or 2014 against the IACC.2.C data.

Training types:

P l e a s e n o t e t h e s e r e s t r i c t i o n s and this information on training types. The submission types (main) are orthogonal to the training types (A, B, C ...).

Submissions:

Each team may submit a maximum of 4 prioritized main runs with 2 additional if they are of the "no annotation" training type and the others are not. The submission formats are described below.

Please note: Only submissions which are valid when checked against the supplied DTDs will be accepted. You must check your submission before submitting it. NIST reserves the right to reject any submission which does not parse correctly against the provided DTD(s). Various checkers exist, e.g., Xerces-J: java sax.SAXCount -v YourSubmision.xml.

Participants in the main version of the task against IACC.2.C will submit real results in each run for all and only the 60 selected concepts and for each concept at most 2000 shot IDs.
Here for download (right click and choose "display page source" so see the entire files) is a DTD for concept (feature) extraction results of one main run, the container for one run, and a small example of what a site would send to NIST for evaluation. Please check all your submissions to see that they are well-formed.
Please submit each of your runs in a separate file, named to make clear which team has produced it. EACH file you submit should begin, as in the example submission, with the DOCTYPE statement that refers to the DTD at NIST via a URL and with a videoFeatureExtractionResults element even though there is only one run is included. Each submitted file must be compressed using just one of the following: gzip, tar, zip, or bzip2.
Remember to use the correct shot IDs in your submissions - the ones from video segment elements the mp7 files in the master shot reference with the format shotFILENUMBER_SHOTNUMBER. Do not use the ID associated with the video (TRECVID2015_FILENUMBER) or the one associated with a keyframe (shotFILENUMBER_SHOTNUMBER_RKF).
Submissions will be transmitted to NIST via a password-protected webpage

Evaluation:

A subset of the submitted concept results (at least 20), to be announced only after the submission date, will be evaluated by assessors at NIST pooling and sampling.

Please note that NIST uses a number of rules in manual assessment of system output.

Measures (indexing):

Mean extended inferred average precision (mean xinfAP), which allows sampling density to vary e.g. so that it can be 100% in the top strata, which are most important for average precision.
As in past years, other detailed measures based on recall, precision will be provided by the sample_eval software.

Issues:

None

5. Surveillance Event Detection (SED):

Task coordinator: Martial Michel

Detecting human behaviors efficiently in vast amounts surveillance video is fundamental technology for a variety of higher-level applications of critical importance to public safety and security. The use case addressed by this task is the retrospective exploration of surveillance video archives optionally using a system designed to support the optimal division of labor between a human user and the software - an interactive system.

System task:

Two event detection tasks will be supported - interactive event detection and a retrospective event detection:

Interactive Event Detection: Given a collection of surveillance video data files (e.g. that from an airport, or commercial establishment) for preprocessing, at test time detect observations of events based on the event definition and for each return the elapsed search time and a list of video segments within the surveillance data files, ranked by likelihood of meeting the need described in the topic. Each search for an event by a searcher can take no more than 25 elapsed minutes, measured from the time the searcher is given the event to look for until the time the result set is considered final.
Retrospective Event Detection: The task is to detect observations of events based on the event definition. Systems may process the full corpus using multiple passes prior to outputting a list of putative events observations. The primary condition for this task will be single-camera input (i.e., the camera views are processed independently). Multiple-camera input may optionally be run as an additional contrastive condition.

A SED event is defined to be "an observable action or change of state in a video stream that would be important for airport security management". Events may vary greatly in duration, from 2 frames to longer duration events that can exceed the bounds of the excerpt. 2015 systems should output detection results for any three events in the following list: PersonRuns, CellToEar, ObjectPut, PeopleMeet, PeopleSplitUp, Embrace, and Pointing.

The main evaluation (EVAL15) will be implemented using a 9 hour subset of the multi-camera airport surveillance domain evaluation data collected by the Home Office Scientific Development Branch (HOSDB). We are introducing for 2015, a new Group Dynamic Subset (SUB15) using only 2 hours of this video and limited to the Embrace, PeopleMeet and PeopleSplitUp events.

Data:

The test data will be the same i-LIDS data that was made available to participants for previous SED evaluations. The actual data subset the evaluation will be performed on will be the same as in the SED 2014 task

Submissions:

Submissions will follow the same format and procedure as in the SED 2014 task. The number of submissions allowed will be defined in the Evaluation Plan. Participants must submit at least one interactive run. An automatic version of each interactive run for comparison may also be submitted.

Evaluation:

The updated SED 2015 webpage with a link to the detailed evaluation plan is now available.

It is assumed the user(s) are system experts and no attempt will be made to separate the contribution of the user and the system. The results for each system+user will be evaluated by NIST as to effectiveness - using standard search measures (e.g., probability of missed detection/false alarm, precision, recall, average precision)- self-reported speed, and user satisfaction (for interactive runs).

Issues:

None

6. Instance search:

Task coordinator: George Awad

An important need in many situations involving video collections (archive video search/reuse, personal video organization/search, surveillance, law enforcement, protection of brand/logo use) is to find more video segments of a certain specific person, object, or place, given a visual example.

In 2015 NIST will create about 30 topics, of which the first 24 will be used for interactive systems. The task will again use the EastEnders data, prepared with major help from several participants in the AXES project (Access to Audiovisual Archives), a four-year FP7 framework research project to develop tools that provide various types of users with new and engaging ways to interact with audiovisual libraries.

System task:

Given a collection of test video, a master shot reference, and a collection of topics (queries) that delimit a person, object, or place entity in some example video, locate for each topic up to the 1000 shots most likely to contain a recognizable instance of the entity. Interactive runs are welcome and will likely return many fewer than 1000 shots. The development of fast AND effective search methods is encouraged.

Data:

Development data: A very small sample (File ID=0) of the BBC Eastenders test data will be available from Dublin City University. No actual development data will be supplied. File 0 is therefore NOT part of the test data and no shots from File 0 should be part of any submission.

Test data: The test data for 2015 will be BBC EastEnders video in MPEG-4 format. The example images in the topics will be in bmp format. See above for information on how to get a copy of the test data.

Topics: Each topic will consist of a set of 4 example frame images (bmp) drawn from test videos containing the item of interest in a variety of sizes to the extent possible. For each frame image there will be a binary mask of the region of interest (ROI), as bounded by a single polygon and the ID from the master shot reference of the shot from which the image example was taken. In creating the masks (in place of a real searcher), we will assume the searcher wants to keep the process simple. So, where multiple targets appear in the image only the most prominent will be inlcuded in the ROI. The ROI may contain non-target pixels, e.g., non-target regions visible through the target or occluding regions. The shots from which example images are drawn for a given concept, will be filtered by NIST from system submissions for that concept before evaluation. Each topic will include an indication of the target type taken from this set of strings {PERSON, LOCATION, OBJECT}. As in 2014, the topic targets will include mostly small and large rigid objects, logos, and people/animals. Also included will be a new attribute, ("multi=(Y|N)") that indicates whether there are multiple targets (e.g. in the case of logos and other objects manufactured to look the same) or just one target (e.g., a specific person, place, building, etc). As in the past, when there are multiple targets the topic text will begin with "A" or "An". When there is just one target, the topic text will begin with "This" or "These" (for a unique set).

Here is an example of a set of topics and here is a pointer to the DTD for an instance search topic (you may need to right click on "view source".

We will allow teams to submit multiple runs (to be counted only as one against the maximum allowed) as long as those runs differ only in what set of examples for a topic are used. The sets will be defined as follows (in the DTD):

A - one or more provided images - no video
E - video examples (+ optionally image examples)

Auxiliary data: Participants are allowed to use various publicly available EastEnders resources as long as they carefully note the use of each such resource by name in their workshop notebook papers. They are strongly encouraged to share information about the existence of such resources with other participants via the tv15.list as soon as they discover them.

Submissions:

Each team may submit a maximum of 4 prioritized runs (note the example set exception mentioned above allowing up to 8 runs in one specific case). All runs will be evaluated but not all may be included in the pools for judgment. Submissions will be identified as either fully automatic or interactive. Interactive runs will be limited to 15 elapsed minutes per search.

Here for download (though they may not display properly) is the DTD for search results of one run, the container for one run, and a small example of what a site would send to NIST for evaluation. Please check your submission to see that it is well-formed

Please submit each run in a separate file, named to make clear which team it is from. EACH file you submit should begin, as in the example submission, with the DOCTYPE statement and a videoSearchResults element even if only one run is included.

Submissions will be transmitted to NIST via a password-protected webpage.

Evaluation:

This task will be treated as a form of search and will accordingly be evaluated with average precision for each topic in each run and per-run mean average precision over all topics. Speed will also be measured: clock time per topic search, reported in seconds (to one decimal place).

Issues:

None

7. Multimedia event detection:

Task coordinator: Jon Fiscus

Video is becoming a new means of documenting everything from recipes to how to change a tire of a car. Ever expanding multimedia video content necessitates development of new technologies for retrieving relevant videos based solely on the audio and visual content of the video. Participating MED teams will create a system that quickly finds events in a large collection of search videos.

System task:

Given an evaluation collection of videos (files) and a set of event kits, provide a rank and confidence score for each evaluation video as to whether the video contains the event. Both the Pre-Specified and AdHoc Event tasks will be supported.

Data:

NIST will create up to 10 new AdHoc event kits. The development data will be the same as last year. The evaluation search collection will be the same as MED '14, i.e., HAVIC Progress and Novel1 data.

Submissions:

Submissions this year will not use the MED I/O server. Instead, submissions will follow the MED '13 paradigm of submissions being made as a single tarball bundle and minimal hardware/runtime reporting. Each team can submit up to 5 runs Pre-Specified Event runs and up to 2 AdHoc event runs. Each run must contain results for a given condition.

Evaluation:

For Ad-Hoc each event, the submissions will be pooled across all runs and a sample judged by human assessors at NIST. Mean inferred average precision will be used to measure run-level effectiveness. Details on the evaluation will be posted on this website.

Open issues

When will the test events be distributed?
[RESOLVED - no charge] Will LDC charge for new copies of the data?

8. Localization:

Task coordinator: George Awad

The localization task will challenge systems to make their concept detection more precise in time and space. Currently SIN systems are accurate to the level of the shot. In the localization task, systems will be asked to determine the presence of the concept temporally within the shot, i.e., with respect to a subset of the frames comprised by the shot, and, spatially, for each such frame that contains the concept, to a bounding rectangle. The localization will be restricted to 10 concepts from those chosen and used in the semantic Indexing task. However, systems can participate in the localization task without submitting runs in the Semantic Indexing task as both tasks this year are run Independently.

Systems task:

Semantic Indexing main task

Data:

The current test data set (IACC.2.C) is a collection of 200 h drawn randomly from the IACC.2 collection.

The development data comprises the IACC.2.A and IACC.2.B data sets. Manual judgments of I-frames from the localization evaluation in 2013 and 2014, including bounding boxes, are available as "localization truth data" from the 2013 and 2014 "Past Data" webpages.

Concepts:

Participants will be ready to submit localization results for the following 10 concepts:

Submissions:

Participants in the localization task will submit in one file per run (up to maximum 4 runs allowed), the localization data for all and only the concept-containing I-Frames in the list of shots distributed by NIST. A standard set of I-Frames, grouped by each master shot and test video file, will be extracted by NIST using ffmpeg and made available to participants.

Each line of submitted localization data will contain the following in groups of ASCII characters separated by 1 space. X and Y coordinates refer to the bounding rectangle. Assume the UpperLeft point in each frame image has coordinates (0,0), LowerRightX > UpperLeftX, LowerRightY > UpperLeftY.

Concept#
|   File# Frame#
|   |     |    UpperLeftX 
|   |     |    |   UpperLeftY
|   |     |    |   |   LowerRightX
|   |     |    |   |   |   LowerRightY 
|   |     |    |   |   |   |
xxx xxxxx xxxx xxx xxx xxx xxx

Example:
31 30356 50 36 20 150 125

Please submit each of your runs in a separate file, named to make clear which team has produced it.
Submissions will be transmitted to NIST via a password-protected webpage

Evaluation:

Measures: Temporal and spatial localization will be evaluated using precision and recall based on the judged items at two levels - the frame and the pixel, respectively. NIST will then calculate an average for each of these values for each concept and for each run.
For each shot that is judged to contain a concept and in the distributed list of shots, a subset of the shot's I-Frames will be viewed and annotated to locate the pixels representing the concept. The set of annotated I-Frames will then be used to evaluate the localization for the I-Frames submitted by the systems.

Issues:

None

9. Video Hyperlinking:

Task coordinator: Roeland Ordelman

Digital archives of professional and user generated multimedia content are currently stored in abundance by broadcasting companies and internet sharing platforms. The videos within these collections can be interconnected by the topic itself, events or activities depicted, the people present in the videos, etc. In this task, we envisage a scenario where users are interested to find further information on some aspect of the topic of interest contained within a video segment, and that they do this by navigating via a hyperlink to other parts of the video collection. To facilitate this searchers need to be provided with the capability of jumping from one part of a video to another within the archive. This requires the construction of a network of hyperlinks between different parts of the videos based on a combination of visual and audio content features, and potentially metadata annotations.

The main research objectives of the TRECVid 2015 Video Hyperlinking task are to investigate the properties of multimodal anchors for use as sources in video hyperlinking, to propose and investigate methods for automated hyperlink creation from this anchors, to explore the relation between relevance in a video search task and a video hyperlinking, and ultimately in future editions of the task to define approaches for enabling personalised story telling or creation of narratives within video collections.

Task:

Given a set of test videos with metadata with a defined set of anchors, each defined by start time and end time in the video, return for each anchor a ranked list of hyperlinking targets: video segments defined by a video ID and start time and end time (possibly of segmented media/video fragments, this is still being defined at the time of this document). Hyperlinking targets pointing to the video where the anchor was extracted from should be excluded and will be disregarded during the evaluation.

Data:

Anchors will be defined by users in a specific user community, for which we plan to use professional users such as journalists and media-researchers. The users will also provide details of why they selected a particular anchor and what kind of targets they expect for this anchor. This information will be used for evaluation and will not be visible to task participants.

example development anchors, example ground-truth for the development anchors, and the set of test anchors

Submissions:

A team can submit up to four runs, for which a short description is necessary.

The mandatory format of each item returned by a run is similar to the usual TREC format:

Participants can provide up to 1000 proposed link targets of which only a subset will be used for evaluation.

Evaluation:

Top ranking results of participant submissions will be assessed using a mechanical turk (MT) crowdsourcing approach, assessing the top ranked documents. We will also run a test assessment on a smaller part of the data by a local team of target users to identify potential discrepancies between the MT workers' judgments and those of the target user group. Descriptions given by the anchor creators (anchor descriptions, description and format of requested targets) will be used for evaluation purposes.

The submissions will be evaluated at least based on the precision at a certain rank measure, adapted to unconstrained time segments, see [1].

Additional evaluation measures are currently being investigated.

Submissions:

Runs will be submitted to a password-protected website at NIST and then forwarded by NIST to the task coordinator for evaluation.

Issues:

[RESOLVED - see Schedule] Dates for data availability, submissions due, etc are will be provided by the time the Guidelines are final on 1. April
[RESOLVED - see Data] Procedures for data download are still to be determined.
[RESOLVED - see Submissions above] Run submission procedure

[1] Adapting binary information retrieval evaluation metrics for segment-based retrieval tasks R Aly, M Eskevich, R Ordelman, GJF Jones - arXiv preprint arXiv:1312.1913, 2013

10.Schedule:

The following are the target dates for 2015:

------------
2. Feb: NIST sends out Call for Participation in TRECVID 2015
23. Feb: Applications for participation in TRECVID 2015 due at NIST
------------
1. Mar: Final versions of TRECVID 2014 papers due at NIST
------------
1. Apr: Guidelines complete
------------
8. May: LNK: Development data available
------------
1. Jun: SED: Dry run starts; LNK: Test anchors available; MED: System Input File Set CSV files provided by NIST; INS: Instance search topics available from NIST
22. Jun: SIN: Submission webpage open (trial submissions highly recommended)
30. Jun: SED: Dry run evaluation ends
------------
6. Jul: SIN: Semantic indexing task submissions due at NIST for evaluation by NOON, Washington, DC time
10. Jul: LNK: Submission webpage open; SIN: Semantic indexing assessment at NIST
17. Jul: LNK: final submissions due at NIST by NOON Washington, DC time
22. Jul: SED: submissions due at NIST
23. Jul: INS: Submission webpage open (trial submissions highly recommended)
24. Jul: LOC: NIST sends out list of shots for localization
29. Jul: SED: initial scoring results returned to participants; MED: AdHoc events released as event texts and a list of video files from the Omnibus Collection
------------
1. Aug: Workshop general and registration information available; TRECVID Workshop general and registration information
5. Aug: MED: AdHoc submissions due at NIST by NOON Washington, DC time; SED: final scoring results returned to participants
7. Aug: INS: Instance search submissions due at NIST by NOON Washington, DC time
12. Aug: MED: Pre-specified submissions due at NIST by NOON Washington, DC time
12. - 28. Aug: MED: Multimedia event detection judging
12. Aug - 3. Sep: INS: Instance search assessment at NIST
14. Aug: SIN: Evaluation results returned; LNK: Evaluation results returned
19. Aug: LOC: Submission webpage open (trial submissions highly recommended)
------------
2. Sep: LOC: Localization task submissions due at NIST by NOON Washington, DC time; LOC: Localization task assessment at NIST; INS: Instance search task results returned
18. Sep: LOC: localization task results returned
------------
2. Oct: Speaker proposals due at NIST by noon Washington, DC time
9. Oct: Final Agenda posted
23. Oct: Workshop notebook papers due at NIST by NOON Washington, DC time (no late submissions allowed); Copyright forms due back at NIST (see Notebook papers for instructions)
9. Nov: TRECVID 2015 Workshop registration closes at 5:00PM Washington, DC time
------------
16-18. Nov (Mon.-Wed): TRECVID Workshop (2.5 days) at NIST in Gaithersburg, MD, USA
---------------- 1. Mar 2016: Final versions of TRECVID 2015 papers due at NIST

George Awad (NIST)

General Coordinators:

Alan Smeaton (Insight Centre for Data Analytics, Dublin City University)

Wessel Kraaij (TNO, Radboud University Nijmegen)

Task coordinators:

See the guidelines section for the task of interest.

Email lists:

Information and discussion for active workshop participants

[email protected]
archive open to active participants only
NIST will subscribe the contact listed in your application to participate when we have received it. Additional members of active participant teams will be subscribed by NIST if they send email to the TRECVID Project Leader. indicating they want to be subscribed, the email address to use, their name, and providing the TRECVID 2015 active participant's password. Groups may combine the information for multiple team members in one email. Please use this format for submitting each name and email address (one name+address per line):
```
Pat Doe <[email protected]>
```
Once subscribed, you can post to this list by sending you thoughts as email to [email protected], where they will be sent out to EVERYONE subscribed to the list, i.e., all the other active participants.

Information and discussion on the surveillance event detection task

[email protected]
If you would like to subscribe, see the event detection webpage for contact information.

Information and discussion on the multimedia event detection task.
- [email protected]
- If you would like to subscribe, see the MED webpage for contact information.

National Institute of
Standards and Technology Home

last updated:
Date created: Thursday, 8-Jan-15
For further information contact [email protected]

Guidelines for TRECVID 2015

0. Table of Contents:

Then ...

Now ...

System task:

Data:

Concepts and relations:

Collaborative annotation:

Common non-annotated training data:

Sharing of components:

Submission types:

Training types:

Submissions:

Evaluation:

Issues:

System task:

Data:

Submissions:

Evaluation:

Issues:

System task:

Data:

Submissions:

Evaluation:

Issues:

Task coordinator: Jon Fiscus

System task:

Data:

Submissions:

Evaluation:

Open issues

Task coordinator: George Awad

Systems task:

Data:

Concepts:

Submissions:

Evaluation:

Issues:

Task coordinator: Roeland Ordelman

Task:

Data:

Submissions:

Evaluation:

Submissions:

Issues:

11. Outstanding 2015 guideline work items