The main goal of the TREC Video Retrieval Evaluation (TRECVID) is to promote progress in content-based analysis of and retrieval from digital video via open, metrics-based evaluation. TRECVID is a laboratory-style evaluation that attempts to model real world situations or significant component tasks involved in such situations.
Up until 2010, TRECVID used test data from a small number of known professional sources - broadcast news organizations, TV program producers, and surveillance systems - that imposed limits on program style, content, production qualities, language, etc. In 2003 - 2006 TRECVID supported experiments in automatic segmentation, indexing, and content-based retrieval of digital video using broadcast news in English, Arabic, and Chinese. TRECVID also completed two years of pilot studies on exploitation of unedited video rushes provided by the BBC. In 2007 - 2009 TRECVID provided participants with cultural, news magazine, documentary, and education programming supplied by the Netherlands Institute for Sound and Vision. Tasks using this video included segmentation, search, feature extraction, and copy detection. Systems were tested in rushes video summarization using the BBC rushes. Surveillance event detection was evaluated using airport surveillance video provided by the UK Home Office. Many resources created by NIST and the TRECVID community are available for continued research on this data independent of TRECVID. See the Past data section of the TRECVID website for pointers.
In 2010 TRECVID confronted known-item search and semantic indexing systems with a new set of Internet videos (referred to in what follows as IACC) characterized by a high degree of diversity in creator, content, style, production qualities, original collection device/encoding, language, etc - as is common in much "Web video". The collection also has associated keywords and descriptions provided by the video donor. The videos are available under Creative Commons licenses from the Internet Archive. The only selection criteria imposed by TRECVID beyond the Creative Commons licensing is one of video duration - they are short (less than 6 min). In addition to the IACC data set, NIST began developing an Internet multimedia test collection (HAVIC) with the Linguistic Data Consortium and used it in growing amounts (up to 8000 h) in TRECVID 2010-present. The airport surveillance video, introduced in TRECVID 2009, has been reused each year since.
New in 2013 was video provided by the BBC. Programming from their long-running EastEnders series was used in the instance search task. An additional 600 h of Internet Archive video available under Creative Commons licensing for research (IACC.2) was used for the semantic indexing task.
In TRECVID 2014 NIST will continue the 2013 tasks with minor revisions and some new test data as follows:
To accommodate TRECVID participants who also plan to attend the ACM Multimedia Conference in Orlando, Florida, 3-7 November, we are holding TRECVID 2014 at the University of Central Florida (Orlando), 10-12 November in collaboration with UCF's Center for Research in Computer Vision.
A number of datasets are available for use in TRECVID 2014 and are described below.
Three datasets (A,B,C) - totaling approximately 7300 Internet Archive videos (144 GB, 600 h) with Creative Commons licenses in MPEG-4/H.264 format with duration ranging from 10 s to 6.4 min and a mean duration of almost 5 min. Most videos will have some metadata provided by the donor available e.g., title, keywords, and description NOTE: Be sure to check the relevant collection.xml file (A, B, C) in the master shot reference and remove files with a "use" attribute set to "dropped" - these are no longer available under a Creative Commons license and are not part of the test collection.
Data use agreements and Distribution: Download for active participants from NIST/mirror servers. See Data use agreements
Master shot reference: Will be available to active participants by download from the active participant's area of the TRECVID website.
Master I-Frames: Will be extracted by NIST using ffmpeg for all videos in the IACC.2.B collection and made available to active participants by download (circa 18 GB) from the active participant's area of the TRECVID website. The I-Frames will be of higher quality than in 2013 and thus larger.
Automatic speech recognition (for English): Will be available to active participants by download from the active participant's area of the TRECVID website.
Three datasets (A,B,C) - totaling approximately 8000 Internet Archive videos (160 GB, 600 h) with Creative Commons licenses in MPEG-4/H.264 format with duration between 10s and 3.5 min. Most videos will have some metadata provided by the donor available e.g., title, keywords, and description
Data use agreements and Distribution: Available by download from the Internet Archive. See TRECVID Past Data page. Or download from the copy on the Dublin City University server, but use the collection.xml files (see TRECVID past data page) for instructions on how to check the current availability of each file.
Master shot reference: Available by download from the TRECVID Past Data page
Automatic speech recognition (for English): Available by download from the TRECVID Past Data page
Approximately 3200 Internet Archive videos (50 GB, 200 h) with Creative Commons licenses in MPEG-4/H.264 format with durations between 3.6 and 4.1 min Most videos will have some metadata provided by the donor available e.g., title, keywords, and description
Data use agreements and Distribution: Available by download from the Internet Archive. See TRECVID Past Data page.
Master shot reference: Available by download from the TRECVID Past Data page
Common feature annotation: Available by download from the TRECVID Past Data page
Automatic speech recognition (for English): Available by download from the TRECVID Past Data page
The data consist of about 150 h of airport surveillance video data (courtesy of the UK Home Office). The Linguistic Data Consortium has provided event annotations for the entire corpus. The corpus was divided into development and evaluation subsets. Annotations for 2008 development and test sets are available.
Data use agreements and Distribution:
Development data annotations: available by download.
Approximately 244 video files (totally 300 GB, 464 h) with associated metadata, each containing a week's worth of BBC EastEnders programs in MPEG-4/H.264 format.
Data use agreements and Distribution: Download and fill out the data permission agreement from the active participants' area of the TRECVID website. After the agreement has been processed by NIST and the BBC, the applicant will be contacted by Dublin City Univerisity with instructions on how to download from their servers. See Data use agreements
Master shot reference: Will be available to active participants by download from the TRECVID 2014 active participant's area.
Automatic speech recognition (for English): Will be available to active participants by download from Dublin City University.
HAVIC is a large collection of Internet multimedia constructed by the Linguistic Data Consortium and NIST. Participants will receive training corpora, event training resources, and two development test collections. Participants will also receive a new test collection; either of:
Data use agreements and Distribution: Data licensing and distribution will be handled by the Linguistic Data Consortium. See the Multimedia Event Detection task webpage for details.
In order to be eligible to receive the data, you must have have applied for participation in TRECVID. Your application will be acknowledged by NIST with a team ID, and active participant's password, and information about how to obtain the data.
Note that all of the IACC.2 and EastEnders data was made available last year. So if you signed the permission form last year and do not need to replace your original copy then you do not need to submit another permission form this year.
In your email include the following:
As Subject: "TRECVID data request" In the body: your name your short team ID (given when you applied to participate) the kinds of data you will be using - one or more of the following: Gatwick (2008), IACC.2, and/or BBC EastEndersYou will receive instructions on how to download the data.
Please ask only for the test data (and optional development data) required for the task(s) you apply to participate in and intend to complete.
Requests are handled in the order they are received. Please allow 5 business days for NIST to respond to your request.for Gatwick or IACC.2 data with the access codes you need to download the data using the information about data servers in the the active participant's area. Requests for the EastEnders data are forwarded within 5 business days to the BBC and from there to DCU, who will contact you with the download information. This process may take up to 3 weeks.
This task will be coordinated by Georges Quénot from the Laboratoire d'Informatique de Grenoble in collaboration with NIST.
Automatic assignment of semantic tags representing visual or multimodal concepts (previously "high-level features") to video segments can be fundamental technology for filtering, categorization, browsing, search, and other video exploitation. New technical issues to be addressed include methods needed/possible as collection size and diversity increase, when the number of concepts increases, and when concepts are related by an ontology. In 2014 the task will again support experiments in the following areas:
Main: Given the test collection (IACC.2.B), master shot reference, and single concept definitions, return for each target concept a list of at most 2000 shot IDs from the test collection ranked according to their likelihood of containing the target.
Localization subtask: For each concept from the list of 10 designated for localization, for each shot of the top-ranked 1000 returned in a main task run, for each I-Frame within the shot that contains the target, return the x,y coordinates of the upper left and lower right vertices of a bounding rectangle which contains all of the target concept and as little more as possible. Systems may find more than one instance of a concept per I-Frame and then may include more than one bounding box for that I-Frame, but only one will be used in the judging since the ground truth will contain only 1 per judged I-Frame, one chosen by the NIST assessor, at least in this first round.
Progress: The progress task is just the main task run additionally, independently on the progress data set for 2014: just IACC.2.C. But, please note, the 2013 test data and assessment should not be used in 2014 system training or validation; otherwise the progress task would be biased. The same training data and annotations should be used as in 2013; this means no additional annotations on 2013 development data .
The current test data set (IACC.2.B) will be 200 h drawn from the IACC.2 collection using videos with durations between 10 s and 6 min.
The progress test data set (IACC.2.C) will be an additional non-overlapping collection of 200 h drawn randomly from the IACC.2 collection.
The development data set combines the development and test data sets of the 2010 and 2011 issues of the task, IACC.1.tv10.training, IACC.1.A, IACC.1.B, and IACC.1.C each containing about 200 h drawn from the IACC.1 collection using videos with durations ranging from 10s to just longer than 3.5 min. These datasets can be downloaded from the Internet Archive using information available on the TRECVID "past data" webpage.
500 concepts were selected for the TRECVID 2011 semantic indexing task. In making this selection, the organizers drew from the 130 used in TRECVID 2010, the 374 selected by CU/Vireo for which there exist annotations on TRECVID 2005 data, and some from the LSCOM ontology. From these 500 concepts, 346 concepts were selected for the full task in 2011 as those for which there exist at least 4 positive samples in the final annotation. A spreadsheet of the concepts is available here with complete definitions and an alignment with CU-VIREO374 where appropriate. [Don't be confused by the multiple numberings in the spreadsheet - use the TV13-15 IDs in the concept lists below under "Submissions".] For 2014 the same list of 500 concepts has been used as a starting point for selecting the 60 single concepts for which participants must submit results in the main task. The concepts (number to be determined) for localization will be a subset of the main task concepts - perhaps about 10.
The organizers have provided again a set of relations between the concepts. There are two types of relations: A implies B and A excludes B. Relations that can be derived by transitivity will not be included. Participants are free to use the relations or not and submissions are not required to comply with them.
It is expected that advanced methods will use the annotations of non-evaluated concepts and the ontology relations to improve the detection of the evaluated concepts. The use of the additional annotations and of ontology relations is optional and comparison between methods that use them and methods that do not is encouraged.
The following types of submissions will be considered:
P l e a s e n o t e t h e s e r e s t r i c t i o n s and this information on training types. The submission types (main) are orthogonal to the training types (A, B, C ...).
Each team may submit a maximum of 4 prioritized main runs with 2 additional if they are of the "no annotation" training type and the others are not. One localization run may be submitted with each main submission. in addition to the above, each team may submit up to 2 progress runs on the progress dataset (IACC.2.C). The submission formats are described below.
Please note: Only submissions which are valid when checked against the supplied DTDs will be accepted. You must check your submission before submitting it. NIST reserves the right to reject any submission which does not parse correctly against the provided DTD(s). Various checkers exist, e.g., Xerces-J: java sax.SAXCount -v YourSubmision.xml.
Concept# | File# Frame# | | | UpperLeftX | | | | UpperLeftY | | | | | LowerRightX | | | | | | LowerRightY | | | | | | | xxx xxxxx xxxx xxx xxx xxx xxx
A subset of the submitted concept results (at least 20), to be announced only after the submission date, will be evaluated by assessors at NIST pooling and sampling.
Please note that NIST uses a number of rules in manual assessment of system output.
Measures (indexing):
Measures (localization): Temporal and spatial localization will be
evaluated using precision and recall based on the judged items at two
levels - the frame and the pixel, respectively. NIST will then
calculate an average for each of these values for each concept and for
each run.
For each shot that is judged to contain a concept, a
subset of the shot's I-Frames will be viewed and annotated to locate
the pixels representing the concept. The set of annotated I-Frames
will then be used to evaluate the localization for the I-Frames
submitted by the systems.
For each run, a total elapsed time in seconds will be reported.
Detecting human behaviors efficiently in vast amounts surveillance video is fundamental technology for a variety of higher-level applications of critical importance to public safety and security. The use case addressed by this task is the retrospective exploration of surveillance video archives using a system designed to support the optimal division of labor between a human user and the software - an interactive system.
Given a collection of surveillance data files (e.g. that from an airport, or commercial establishment) for preprocessing, at test time take a small set of topics (search requests for known events) and for each return the elapsed search time and a list of video segments within the surveillance data files, ranked by likelihood of meeting the need described in the topic. Each search for an event by a searcher can take no more than 25 elapsed minutes, measured from the time the searcher is given the event to look for until the time the result set is considered final.
The test data will be the same i-LIDS data that was made available to participants for previous SED evaluations. The actual data set is still being defined at the time of this document.
Submissions will follow the same format and procedure as in the SED 2013 task. The number of submissions allowed will be determined by the time the Guidelines are final. Participants must submit at least one interactive run. An automatic version of each interactive run for comparison may also be submitted.
It is assumed the user(s) are system experts and no attempt will be made to separate the contribution of the user and the system. The results for each system+user will be evaluated by NIST as to effectiveness - using standard search measures (e.g., probability of missed detection/false alarm, precision, recall, average precision)- self-reported speed, and user satisfaction (for interactive runs).
An important need in many situations involving video collections (archive video search/reuse, personal video organization/search, surveillance, law enforcement, protection of brand/logo use) is to find more video segments of a certain specific person, object, or place, given a visual example.
In 2014 NIST will create about 30 topics, of which the first 24 will be used for interactive systems. The task will again use the EastEnders data, prepared with major help from several participants in the AXES project (access to audiovisual archives), a four-year FP7 framework research project to develop tools that provide various types of users with new and engaging ways to interact with audiovisual libraries.
Given a collection of test video, a master shot reference, and a collection of topics (queries) that delimit a person, object, or place entity in some example video, locate for each topic up to the 1000 shots most likely to contain a recognizable instance of the entity. Interactive runs are welcome and will likely return many fewer than 1000 shots.
Development data: A very small sample (File ID=0) of the BBC Eastenders test data will be available from Dublin City University. No actual development data will be supplied. File 0 is therefore NOT part of the test data and no shots from File 0 should be part of any submission.
Test data: The test data for 2014 will be BBC EastEnders video in MPEG-4 format. The example images in the topics will be in bmp format. See above for information on how to get a copy of the test data.
Topics: Each topic will consist of a set of 4 example frame images (bmp) drawn from test videos containing the item of interest in a variety of sizes to the extent possible. The shots from which example images are drawn for a given concept, will be filtered by NIST from system submissions for that concept before evaluation. For each frame image a binary mask of the region of interest, as bounded by a single polygon, will be provided. Each topic will also include an indication of the target type taken from this set of strings {PERSON, LOCATION, OBJECT}. As in 2013, the topic targets will include mostly small and large rigid objects, logos, and people/animals.
Here is an example of a set of topics and here is a pointer to the DTD for an instance search topic (you may need to right click on "view source".
We will allow teams to submit multiple runs (to be counted only as one against the maximum allowed) as long as those runs differ only in what set of examples for a topic are used. The sets will be defined as follows (in the DTD):
Auxilliary data: Participants are allowed to use various publicly available EastEnders resources as long as they carefully note the use of each such resource by name in their workshop notebook papers. They are strongly encouraged to share information about the existence of such resources with other participants via the tv14.list as soon as they discover them.
Each team may submit a maximum of 4 prioritized runs (note the example set exception mentioned above). All runs will be evaluated but not all may be included in the pools for judgment. Submissions will be identified as either fully automatic or interactive. Interactive runs will be limited to 15 elapsed minutes per search.
Please note: Only submissions which are valid when checked against the supplied DTDs will be accepted. You must check your submission before submitting it. NIST reserves the right to reject any submission which does not parse correctly against the provided DTD(s). Various checkers exist, e.g., Xerces-J: java sax.SAXCount -v YourSubmision.xml.
Here for download (though they may not display properly) is the DTD for search results of one run, the container for one run, and a small example of what a site would send to NIST for evaluation. Please check your submission to see that it is well-formed
Please submit each run in a separate file, named to make clear which team it is from. EACH file you submit should begin, as in the example submission, with the DOCTYPE statement and a videoSearchResults element even if only one run is included.
Submissions should be transmitted to NIST via this webpage.
This task will be treated as a form of search and will accordingly be evaluated with average precision for each topic in each run and per-run mean average precision over all topics. Speed will also be measured: clock time per topic search, reported in seconds (to one decimal place).
Video is becoming a new means of documenting everything from recipes to how to change a tire of a car. Ever expanding multimedia video content necessitates development of new technologies for retrieving relevant videos based solely on the audio and visual content of the video. Participating MED teams will create a prototype system that quickly finds events in a large collection of search videos.
Detailed information on the MED system tasks, data, submission process, and evaluation can be found in the 2014 combined MED/MER Evaluation Plan. See the 2014 Multimedia Event Detection task webpage.
Participating MER teams will create a MED system that not only searches for and detects an event in videos but also recounts the evidence for the presence of that event in each search video determined to contain it. A recounting specifies what key evidence defines the event and how this evidence should be combined to form the event detection score. It should do so clearly and concisely such that English-speaking NIST judges can readily understand it.
Detailed information on the MER system tasks, data, submission process, and evaluation can be found in the combined 2014 MED/MER Evaluation Plan. See the 2014 Multimedia Event Detection task webpage.
The following are the target dates for 2014:
Here is a list of work items that must be completed before the guidelines are considered to be final..
Pat Doe <patd@example.com>
Once subscribed, you can post to this list by sending you thoughts as email to tv14.list@nist.gov, where they will be sent out to EVERYONE subscribed to the list, i.e., all the other active participants.