TRECVID 2008 Guidelines

Guidelines for the TRECVID 2008 Evaluation

(last updated: )

0. Table of Contents:

Introduction
Video data
Data license agreements for active participants
System task details
Submissions and evaluations
Schedule
Outstanding 2008 guideline work items
Information for active participants
Contacts

1. Introduction:

The main goal of the TREC Video Retrieval Evaluation (TRECVID) is to promote progress in content-based analysis of and retrieval from digital video via open, metrics-based evaluation. TRECVID is a laboratory-style evaluation that attempts to model real world situations or significant component tasks involved in such situations.

In 2006 TRECVID completed the second two-year cycle devoted to automatic segmentation, indexing, and content-based retrieval of digital video - broadcast news in English, Arabic, and Chinese. It also completed two years of pilot studies on exploitation of unedited video (rushes). Some 70 research groups have been provided with the TRECVID 2005-2006 broadcast news video and many resources created by NIST and the TRECVID community are available for continued research on this data independent of TRECVID. See the "Past data" section of the TRECVID website for pointers.

In 2007 TRECVID began exploring new data (cultural, news magazine, documentary, and education programming) and an additional, new task - video rushes summarization. In 2008 that work will continue with the exception of the shot boundary detection task, which will be retired. In addition TRECVID plans to organize two new task evaluations.

TRECVID 2008 will test systems on the following tasks:

surveillance event detection
high-level feature extraction
search (interactive, manually-assisted, and/or fully automatic)
rushes summarization
content-based copy detection

For past participants, here are some changes to note:

We expect to increase the number of topics for automatic search runs to ~ 50. Manual and interactive runs will use a subset of 24 of the 50. For automatic runs only, this will entail evaluating all search runs using a 50% sample of the pooled submissions and inferred average precision rather than average precision - as has been done for the feature task since 2006.
The upper limit for the duration of each interactive or manual search will be reduced from 15 minutes to 10 minutes.
The upper limit for video summaries will be reduced from 4% of the duration of the full video to 2%.
TRECVID will continue to emphasize search for events (object+action) not easily captured in a single frame as opposed to searching for static objects.
While mastershots will be defined as units of evaluation, keyframes or annotation of keyframes will not be provided by NIST. This will require groups to look afresh at how best to train their systems - tradeoffs between processing speed, effectiveness, amount of the video processed. As in the past, participants may want to team up to create training resources.
The degree to which systems trained on broadcast news generalize with varying amounts of training data to a related but different genre will be a focus of TRECVID 2008.

2. Video data:

A number of datasets are available for use in TRECVID 2008. We describe them here and then indicate below which data will be used for development versus test for each task.

Sound and Vision

The Netherlands Institute for Sound and Vision has generously provided news magazine, science news, news reports, documentaries, educational programming, and archival video in MPEG-1 for use within TRECVID.
In 2007 we used about 50 hours for development and 50 hours for search and feature test. These 100 hours will be available as development data for the search and feature tasks in 2008. There will be about another 100 hours for use as test data for the feature and search tasks..
- ~ 100 hours for development of search and feature detection
- ~ 100 hours for test of search and feature detection
The copy detection task will use all 200 hours as test data; for development data under "MUSCLE-VCD-2007" data below.
Tasks: search, feature extraction, and copy detection
Distribution: by download from password-protected servers at NIST and elsewhere.
Training truth data for search and feature tasks:
- feature annotations of the 2007, 2005, and 2003 data are available from the "Past data" section of the TRECVID website.
- a community effort to annotate the 2008 development data (for different features than in 2007) may be organized if there is sufficient interest.
Master shot reference: Christian Petersohn at the Fraunhofer (Heinrich Hertz) Institute in Berlin has again provided the master shot reference. Please use the following reference in your papers:
```
C. Petersohn. "Fraunhofer HHI at TRECVID 2004:  Shot Boundary Detection System", 
TREC Video Retrieval Evaluation Online Proceedings, TRECVID, 2004
URL: www-nlpir.nist.gov/projects/tvpubs/tvpapers04/fraunhofer.pdf
```
Code developed by Peter Wilkins and Kirk Zhang at Dublin City University will be used to format the reference. The method used in 2005/6 and to be repeated with the data for 2008 is described here.
Automatic speech recognition (Dutch): The University of Twente has offered to provide the output of an automatic speech recognition system on the Sound and Vision data. Please use the following reference in your papers:
```
Marijn Huijbregts, Roeland Ordelman and Franciska de Jong, Annotation
of Heterogeneous Multimedia Content Using Automatic Speech
Recognition. in Proceedings of SAMT, December 5-7 2007, Genova, Italy
```
Machine translation (Dutch to English): Christof Monz of Queen Mary, University London will again contribute machine translation (Dutch to English) for the Sound and Vision video (ASR output or speech)
Keyframes: NIST will not be supplying keyframes for the Sound and Vision video. This will require groups to look afresh at how best to train their systems - tradeoffs between processing speed, effectiveness, amount of the video processed.
Restrictions on the use of development and test data: You must read this

BBC rushes

The BBC Archive has provided unedited material in MPEG-1 from about five dramatic series for use within TRECVID.
In 2007 we used about 18 hours (43 videos) for development and about 17 hours (42 videos) for testing. All of these videos and the submitted summaries will be available as development data for 2008. There will be another 18 hours (40 videos) for use as test data for video summarization.
- ~ 35 hours (57 clips) for rushes summarization development
- ~ 18 hours (40 clips) for rushes summarization test
Tasks: video summarization
Distribution: by download from password-protected servers at NIST and elsewhere.
Training truth data: the submitted summaries from 2007 and the ground truth for 2007 are available to participants.

TRECVID 2008 surveillance video

The UK Home Office Scientific Development Branch has provided surveillance video for use by TRECVID in 2008. It comprises about 100 hours - the output of 5 cameras from the same period of 20 hours (2 hours per day over 10 days).
Tasks: surveillance event detection.
Distribution: on hard drives; possibly by download
Training truth data: annotations for training data will be provided to participants.

Further information about the data is available here.

MUSCLE-VCD-2007

For development data, the copy detection task will use the MUSCLE-VCD-2007 data. This is the data that was used for the copy detection evaluation at CIVR 2007

3. Data license agreements for active participants

In order to be eligible to receive the data, you must have have applied for participation in TRECVID. Your application will be acknowledged by NIST with information about how to obtain the data. Then you will need to complete the relevant permission forms (from the active participant's area) and fax them (Attention: Lori Buckland) to in the US. Include a cover sheet with your fax that identifies you, your organization, your email address, and each kind of data you are requesting. Alternatively you may email a well-identified pdf of each signed form to Please ask only for the test data (and optional development data) required for the task(s) you apply to participate in and intend to complete. One permission form will cover 2007 and 2008 BBC data. One permission form will cover 2007 and 2008 Sound and Vision data.

Surveillance event detection -> TRECVID 2008 airport surveillance video

development data
test data

Search, feature extraction, copy detection -> Sound and Vision
- optional for copy detection: development data (available separately from MUSCLE-VCD-2007 webpage. Do not request from NIST.
Rushes summarization -> BBC rushes
- test data
- optional: 2008 development data (= 2007 BBC development + test data)

4. System task details:

4.1 Surveillance event detection:

The guidelines for this task have been developed with input from the research community. Given 100 hours of surveillance video (50 hours training, 50 hours test) the task is to detect 3 or more events from the required event set and identify their occurrences temporally. Systems can make multiple passes before outputting a list of putative event observations (i.e., this is a retrospective detection task). Besides the retrospective task, participants may alternatively choose to do a "free style" analysis of the data. Further information about the tasks may be found at the following web sites:

Overview Tasks

4.2 High-level feature extraction:

Various high-level semantic features, concepts such as "Indoor/Outdoor", "People", "Speech" etc., occur frequently in video databases. The proposed task will contribute to work on a benchmark for evaluating the effectiveness of detection methods for semantic concepts

The task is as follows: given the feature test collection, the common shot boundary reference for the feature extraction test collection, and the list of feature definitions (see below), participants will return for each feature the list of at most 2000 shots from the test collection, ranked according to the highest possibility of detecting the presence of the feature. Each feature is assumed to be binary, i.e., it is either present or absent in the given reference shot.

All feature detection submissions will be made available to all participants for use in the search task - unless the submitter explicitly asks NIST before submission not to do this.

Description of high-level features to be detected:

The descriptions are those used in the common annotation effort. They are meant for humans, e.g., assessors/annotators creating truth data and system developers attempting to automate feature detection. They are not meant to indicate how automatic detection should be achieved.

If the feature is true for some frame (sequence) within the shot, then it is true for the shot; and vice versa. This is a simplification adopted for the benefits it affords in pooling of results and approximating the basis for calculating recall.

NOTE: In the following, "contains x" is short for "contains x to a degree sufficient for x to be recognizable as x to a human" . This means among other things that unless explicitly stated, partial visibility or audibility may suffice.

NOTE: NIST will instruct the assessors during the manual evaluation of the feature task submissions as follows. The fact that a segment contains video of physical objects representing the topic target, such as photos, paintings, models, or toy versions of the topic target, should NOT be grounds for judging the feature to be true for the segment. Containing video of the target within video may be grounds for doing so.

Selection of high-level features to be detected:

In 2008, participants in the high-level feature task must submit results for all 20 of the following features. NIST will then choose 10-20 of the features and evaluate submissions for those. The features were drawn from the large LSCOM feature set so as to be appropriate to the Sound and Vision data used in the feature and search tasks. Some feature definitions were enhanced for greater clarity, so it is important that the TRECVID feature descriptions be used and not the LSCOM descriptions.

Here is the final list of features for evaluation together with their brief descriptions and some general rules of interpretation

4.3 Search:

Search is high-level task which includes at least query-based retrieval and browsing. The search task models that of an intelligence analyst or analogous worker, who is looking for segments of video containing persons, objects, events, locations, etc. of interest. These persons, objects, etc. may be peripheral or accidental to the original subject of the video. The task is as follows: given the search test collection, a multimedia statement of information need (topic), and the common shot boundary reference for the search test collection, return a ranked list of at most 1000 common reference shots from the test collection, which best satisfy the need. Please note the following restrictions for this task:

TRECVID 2008 will accept fully automatic search submissions (no human input in the loop) as well as manually-assisted and interactive submissions as illustrated graphically below

Because the choice of features and their combination for search is an open research question, no attempt will be made to restrict groups with respect to their use of features in search. However, groups making manually-assisted runs should report their queries, query features, and feature definitions.

Every submitted run must contain a result set for each topic.

One baseline run will be required of every manually-assisted system as well one for every automatic system
- A run based only on the text from the (English and/or Dutch) ASR/MT output provided by NIST and on the text of the topics.

In order to maximize comparability within and across participating groups, all manually-assisted runs within any given site must be carried out by the same person.

An interactive run will contain one result for each and every topic, each such result using the same system variant. Each result for a topic can come from only one searcher, but the same searcher does not need to be used for all topics in a run. Here are some suggestions for interactive experiments.

The searcher should have no experience of the topics beyond the general world knowledge of an educated adult.

The search system cannot be trained, pre-configured, or otherwise tuned to the topics.

The maximum total elapsed time limit for each topic (from the time the searcher sees the topic until the time the final result set for that topic is returned) in an interactive search run will be 10 minutes. For manually-assisted runs the manual effort (topic to query translation) for any given topic will be limited to 10 minutes.

All groups submitting search runs must include the actual elapsed time spent as defined in the videoSearchRunResult.dtd.

Groups carrying out interactive runs should measure user characteristics and satisfaction as well and report this with their results, but they need not submit this information to NIST. Here is some information about the questionnaires the Dublin City University team used in 2004 to collect search feedback and demographics from all groups doing interactive searching. Something similar will be done again this year, with details to be determined once participation is known.

In general, groups are reminded to use good experimental design principles. These include among other things, randomizing the order in which topics are searched for each run so as to balance learning effects.

Supplemental interactive search runs, i.e., runs which do not contribute to the pools but are evaluated by NIST, will be allowed to enable groups to fill out an experimental design. Such runs must not be mixed in the same submission file with non-supplemental runs. This is the only sort of supplemental run that will be accepted.

4.4 Rushes summarization:

Rushes are the raw material (extra video, B-rolls footage) used to produce a video. 20 to 40 times as much material may be shot as actually becomes part of the finished product. Rushes usually have only natural sound. Actors are only sometimes present. So very little if any information is encoded in speech. Rushes contain many frames or sequences of frames that are highly repetitive, e.g., many takes of the same scene redone due to errors (e.g. an actor gets his lines wrong, a plane flies over, etc.), long segments in which the camera is fixed on a given scene or barely moving,etc. A significant part of the material might qualify as stock footage - reusable shots of people, objects, events, locations, etc. Rushes may share some characteristics with "ground reconnaissance" video.

The system task in rushes summarization will be, given a video from the rushes test collection, to automatically create an MPEG-1 summary clip less than or equal to a maximum duration (to be determined) that shows the main objects (animate and inanimate) and events in the rushes video to be summarized. The summary should minimize the number of frames used and present the information in ways that maximizes the usability of the summary and speed of objects/event recognition.

Such a summary could be returned with each video found by a video search engine much text search engines return short lists of keywords (in context) for each document found - to help the searcher decide whether to explore a given item further without viewing the whole item. It might be input to a larger system for filtering, exploring and managing rushes data.

Although in this task we limit the notion of visual summary to a single clip that will be evaluated using simple play and pause controls, there is still room for creativity in generating the summary. Summaries need not be series of frames taken directly from the video to be summarized and presented in the same order. Summaries can contain picture-in-picture, split screens, and results of other techniques for organizing the summary. Such approaches will raise interesting questions of usability.

The summarization of BBC rushes will be run as a workshop at the ACM Multimedia Conference in Vancouver, Canada during the last week of October 2008.

4.5 Content-based copy detection:

As used here, a copy is a segment of video derived from another video, usually by means of various transformations such as addition, deletion, modification (of aspect, color, contrast, encoding, ...), camcording, etc. Detecting copies is important for copyright control, business intelligence and advertisment tracking, law enforcement investigations, etc. Content-based copy detection offers an alternative to watermarking. The TRECVID copy detection task will be carried out in collaboration with members of the IMEDIA team at INRIA and will build on work demonstrated at CIVR 2007.

Required task

The required system task will be as follows: given a test collection of videos and a set of about 2000 queries (video-only segments), determine for each query the place, if any, that some part of the query occurs, with possible transformations, in the test collection. The set of possible transformations will be based to the extent possible on actually occurring transformations.

Each query will be constructed using tools developed by IMEDIA to include some randomization at various decision points in the construction of the query set. For each query, the tools will take a segment from the test collection, optionally transform it, embed it in some video segment which does not occur in the test collection, and then finally apply one or more transformations to the entire query segment. Some queries may contain no test segment; others may be composed entirely of the test segment. Video transformations to be used are documented in the general plan for query creation. and in the final video transformations document with examples..

Optional tasks

Videos often contain audio. Sometimes the original audio is retained in the copied material, sometimes it is replaced by a new soundtrack. Nevertheless, audio is an important and strong feature for some application scenarios of video copy detection. Since detection of untransformed audio copies is relatively easy, and the primary interest of the TV community is in video analysis, it was decided to model the required CD task with video-only queries. However, since audio is of importance for practical applications, there will be two additional optional tasks: a task using transformed audio-only queries and one using transformed audio+video queries.

The audio-only queries will be generated along the same lines as the video-only queries: a set of 201 base audio-only queries is transformed by several techniques that are intended to be typical of those that would occur in real reuse scenarios: (1) bandwidth limitation (2) other coding-related distortion (e.g. subband quantization noise) (3) variable mixing with unrelated audio content. The transformed queries will be downloadable from NIST.

The audio+video queries will consist of the aligned versions of transformed audio and video queries, i.e, they will be various combinations of transformed audio and transformed video from a given base audio+video query. In this way sites can study the effectiveness of their systems for individual audio and video transformations and their combinations. These queries will not be downloadable. Rather, NIST will provide a list of how to construct each audio+video test query so that given the audio-only queries and the video-only queries, sites can use a tool such as ffmpeg to construct the audio+video queries.

Please watch the schedule for information soon about the sequence of query releases and results due dates.

5. Submissions and Evaluations:

Please note: Only submissions which are valid when checked against the supplied DTDs will be accepted. You must check your submission before submitting it. NIST reserves the right to reject any submission which does not parse correctly against the provided DTD(s). Various checkers exist, e.g., Xerces-J: java sax.SAXCount -v YourSubmision.xml.

The results of the evaluation will be made available to attendees at the TRECVID workshop and will be published in the final proceedings and/or on the TRECVID website within six months after the workshop. All submissions will likewise be available to interested researchers via the TRECVID website within six months of the workshop.

5.1 Surveillance event detection pilot

Submissions

The guidelines for submission are currently being developed.

Further information on submissions may be found here.

Evaluation

Output from systems will first be aligned to ground truth annotations, then scored for misses / false alarms. Since error is a tradeoff between probability of miss vs. rate of false alarms, this task will use the Normalized Detection Cost Rate (NDCR) measure for evaluating system performance. NDCR is a weighted linear combination of the system's Missed Detection Probability and False Alarm Rate (measured per unit time).

Further information about the evaluation measures may be found here.

5.2 High-level feature extraction

Submissions

Each group may submit up to 6 total runs. All runs must be prioritized and all will be evaluated.

TRECVID 2008 will require a feature run treating the new video as if no automatic speech recognition (ASR) or machine translation (MT) for the languages of the videos (mostly Dutch) existed - as might occur in the case of video in other less well known languages.

For each feature in a run, participants will return at most 2000.

Here is a DTD for feature extraction results of one run, one for results from multiple runs, and a small example of what a site would send to NIST for evaluation. Please check your submission to see that it is well-formed
Submissions will be transmitted to NIST via a webpage. Details will be available well before the submission deadline.
Each run must contain results for all features listed above

Evaluation

The unit of testing and performance assessment will be the video shot as defined by the track's common shot boundary reference. The submitted ranked shot lists for the detection of each feature will be judged manually as follows. We will take all shots down to some fixed depth (in ranked order) from the submissions for a given feature - using some fixed number of runs from each group in priority sequence up to the median of the number of runs submitted by any group. We will then merge the resulting lists and create a list of unique shots. These will be judged manually down to some depth to be determined by NIST based on available assessor time and number of correct shots found. NIST will maximize the number of shots judged within practical limits. We will then evaluate each submission to its full depth based on the results of assessing the merged subsets. This process will be repeated for each feature.
If the feature is perceivable by the assessor for some frame (sequence) however short or long then, then we'll assess it as true; otherwise false. We'll rely on the complex thresholds built into the human perceptual systems. Search and feature extraction applications are likely - ultimately - to face the complex judgment of a human with whatever variability is inherent in that.
Runs will be compared based on a sample of the submission pools. Precision-recall curves based on the sample will be used as well as inferred average precision, which provides a good estimate of average precision - a single-valued combination of precision, recall, and ranking ability.

5.3 Search

Submissions

Each group may submit up to 6 total runs. All runs must be prioritized and all will be evaluated.

Each interactive run will contain one result for each and every topic using the system variant for that run. Each result for a topic can come from only one searcher, but the same searcher does not need to be used for all topics in a run. If a site has more than one searcher's result for a given topic and system variant, it will be up to the site to determine which searcher's result is included in the submitted result. NIST will try to make provision for the evaluation of supplemental results, i.e., ones NOT chosen for the submission described above. Details on this will be available by the time the topics are released.

For manual and automatic systems, TRECVID 2008 will require:

A run based only on the text from the (English and/or Dutch) ASR/MT output provided by NIST and on the text of the topics.

For each topic in a run, participants will return the list of at most 1000 shots.

Here is a

DTD for search results of one run

results from multiple runs

small example

Submissions will be transmitted to NIST via a webpage. Details will be available well before the submission deadline.

Evaluation

The unit of testing and performance assessment will be the video shot as defined by the track's common shot boundary reference. The submitted ranked lists of shots found relevant to a given topic will be judged manually as follows. We will take all shots down to some fixed depth (in ranked order) from the submissions for a given topic - using some fixed number of runs from each group in priority sequence up to the median of the number of runs submitted by any group. We will then merge the resulting lists and create a list of unique shots. A random sample of these pools will be taken and this sample will be judged manually to some depth to be determined by NIST based on available assessor time and number of correct shots found. NIST will maximize the number of shots judged within practical limits. We will then evaluate each submission to its full depth based on the results of assessing the merged subsets. This process will be repeated for each topic.
Per-search measures:

inferred average precision (definition below)
elapsed time (for all runs)

Per-run measure:

mean inferred average precision:

inferred average precision

average precision

mean average precision (MAP)

TREC-10 Proceedings appendix on common evaluation measures

5.4 Rushes summarization

Submissions

Each participating group will be allowed to submit up to two runs, one with priority 1 and one with priority 2. Depending on the number of submissions, we may not be able to judge all submitted runs. If so we will do them in priority order. If you do not need to submit two distinct runs, please do not. Each run will contain one MPEG-1 summary clip (same frame size and frame rate as the original videos) for each of the test rushes videos along with the system time (in seconds) needed to create the summary starting only with the video to be summarized.
For practical reasons in planning the assessment we need an upper limit on the size of the summaries. Also, some very long summaries make no sense for a given use scenario. But you can imagine many scenarios to motivate various answers. One might involve passing the summary to downstream applications that support, clustering, filtering, sophisticated browsing for rushes exploration, management, reuse. Minimal emphasis on compression.
Assuming we want the summary to be directly usable by a human, then at least the summary should be usable by a professional, looking for reusable material, and willing to watch a summary longer than someone with more recreational goals.
Therefore we'll allow longer summaries than a recreational user would tolerate but score results so that systems that can meet a higher goal (much shorter summary) get rewarded - e.g., present mean-fraction-of-ground-truth-items-included versus duration-of-the-summary or calculate ratio.
Each submitted summary will have a duration which is at most 2% of the video to be summarized. Remember 2% is not a goal - it is just an UPPER limit on size.
How to submit summaries:

The primary method for submitting summaries to NIST will be as follows. Each group will create one file containing a list of URLs - one URL per line for each summary they are submitting. If the group is submitting only one run then the URL file will contain 40 URL lines; if two runs, then it will contain 80 URL lines.
The first two lines of the URL file will contain the userid (on line 1) and the password (on line 2) to be used to access the summaries. We expect the summaries to be in a protected (non-spidered) directory. The scheme in each URL can be "http" or "ftp". For example:
http://HOST/PATH/TO/SUMMARIES/1.MS237650.sum.mpg

Please name your test summaries *exactly* the same as the file containing the video being summarized *except* with the priority ( "1" or "2") and ".sum" inserted before the ".mpg". For example, the priority 1 summary of test file MS237650.mpg should be called 1.MS237650.sum.mpg by every group. NIST will add a unique group prefix here.
NIST will provide webpage each group can use to identify itself, provide a contact email address, and type in the name of their URL file for upload to NIST. Once the URL file has been uploaded, it will be checked for simple errors and a message sent to the browser. After that NIST will proceed to use the URLs to upload each summary. An email will be sent to the submitting person as each summary is uploaded. This will allow the submitter to see the progress and provide detailed information about which, if any, uploads failed.
Although a little more complicated than last year, we hope the method will be less labor-intensive than last year's method which required each summary's name to be entered individually and uploaded before going on to the next.
If you cannot make use of the primary submission method described above you must notify NIST immediately so we can arrange for you to use last year's method for submission. In which case, you will need to leave more time for submission.

In the body of an email with your short team name in the subject, please send to the timing information for your summaries. At the top of the file place the following information:
Operating system CPU type Memory
Then include a line for each summary with the elapsed time in seconds to create that summary. For example:
Short_team_ID Time(s) Priority Video_being_summarized Brno 469.36 1 MRS035126.mpg Brno 443.74 1 MRS042538.mpg Brno 573.94 1 MRS043405.mpg Brno 665.83 1 MRS044497.mpg Brno 470.94 1 MRS044499.mpg Brno 869.20 1 MRS044725.mpg Brno 369.14 1 MRS044728.mpg ...

Evaluation

At Dublin City University, with support from the European Commission under contract FP6-027026 (K-Space), all the summary clips for a given video will be viewed using mplayer on Linux in a window 125mm x 102mm @ 25 fps in a randomized order by a single human judge. In a timed process, the judge will play / pause the video as needed to determine as quickly as possible which of the objects and events listed in the ground truth for the video to be summarized are present in the summary.
The judge will also be asked to assess the usability/quality of the summary. Included will be at least something like the following with 5 possible answers for each - where only the extremes are labeled: "Strongly agree" and "strongly disagree".
- The summary contains nearly identical segments.
- The summary contains color bars, clapboards, and/or totally black or totally white frames.
- The summary is presented in a pleasant tempo and rhythm.
This process will be repeated for each test video. If possible we will have more than one human evaluate at least some of the videos.
Per-summary measures:

fraction of the ground truth objects/events found in the summary (to estimate recall)
time (in seconds) needed to check summary against ground truth
duration of the summary (to estimate precision and act as a normalization factor for other measures)
system time (in seconds) to generate the summary
usability/quality scores

Per-system measures:
- Means of the above across all test videos (in relation to median/max for all systems

Carnegie Mellon University will again provide a simple baseline system to produce summaries within the 2% maximum. The baseline algorithm simply presents the entire video at 50x normal speed.

5.5 Content-based copy detection pilot

Submissions

Information on submission of copy detection runs has been collected in a separate document.

Evaluation

Information on evaluation of copy detection runs has been collected in a separate document.

6. Schedule:

The following are the target dates for 2008.

The schedule for the surveillance event detection task listed at the end of this document .

Just below is the proposed schedule for work on the BBC rushes
summarization task that will be held as a workshop at the ACM
Multimedia Conference in Vancouver, Canada during the last week of
October 2008. Results will be summarized at the TRECVID workshop in
November. Papers reporting participants' summarization that are not
included in the ACM Multimedia Worhshop proceedings should be
submitted for inclusioni in the TRECVID workshop notebook.

  1   Apr  test data available for download
  5   May  system output submitted to NIST for judging at DCU
  1   Jun  evaluation results distributed to participants
 28   Jun  papers (max 5 pages) due in ACM format 
           The organizers will provide an intro paper with information 
           about the data, task, groundtruthing, evaluation, measures, etc.
 15   Jul  acceptance notification  
  1   Aug  camera-ready papers due via ACM process
 31   Oct  video summarization workshop at ACM Multimedia '08, Vancouver, BC, Canada

1. Feb: NIST sends out Call for Participation in TRECVID 2008
22. Feb: Applications for participation in TRECVID 2008 due at NIST
1 Mar: Final versions of TRECVID 2007 papers due at NIST
1. Apr: Guidelines complete
11. Apr: Extended participant decision deadline for event detection task
April: Download of feature/search development data
June: Download of feature/search test data
30. June: Video-only copy detection queries available for download
1. Aug: Video-only copy detection submissions due at NIST for evaluation
Audio-only copy detection queries avilable for download
8. Aug: Search topics available from TRECVID website.
15. Aug: Feature extraction tasks submissions due at NIST for evaluation.
Feature extraction donations due at NIST
22. Aug: Feature extraction donations available for active participants
25. Aug - 12. Sep: Feature assessment at NIST
29. Aug: Audio-only copy detection submissions due at NIST
Audio+video copy detection query plans available for download
12. Sep: Search task submissions due at NIST for evaluation
19. Sep: Results of feature extraction evaluations returned to participants
22. Sep - 10. Oct: Search assessment at NIST
1. Oct: Audio+video copy detection submissions due at NIST for evaluation
Video-only and audio-only copy detection results returned to participants
9. Oct: Audio+video copy detection results returned to participants
TRECVID workshop registration opens
15. Oct: Results of search evaluations returned to participants
15. Oct: Results of search evaluations returned to participants
20. Oct: Speaker proposals due at NIST
27. Oct: Notebook papers due at NIST
1. Nov: Copyright forms due back at NIST (see Notebook papers for instructions)
10. Nov: TRECVID 2008 Workshop registration closes
17,18 Nov: TRECVID Workshop at NIST in Gaithersburg, MD
15. Dec: Workshop papers publicly available (slides added as they arrive)
1. Mar 2009: Final versions of TRECVID 2008 papers due at NIST

7. Outstanding 2008 guideline work items

Here is a list of work items that must be completed before the guidelines are considered to be final..

Choose features to evaluate - more appropriate to S&V data, better definitions.
DONE. See above
Poll participants for interest in community effort for creation of more/better S&V training data
UNDERWAY.
Agree on final measures for copy-based detection task
DONE. See above.
Decide how to handle search systems, if any, outside the rules (collaborative, etc.)
DONE. Few, if any, likely. Handle early on an individual basis.
Should interactive and manual search results mark shots a human has directly selected?
DONE. No interest pro or con was expressed so no change will be made to the search submissions.
Decide whether for interactive search runs to require reporting of a search time offset for each shot found by a human searcher.
DONE. This is already available as an optional attribute for each search result time (see the search results dtd).
Decide on best way to collect run metadata (in submission XML, separate web form, structured paper abstract,...)
DEFERRED until summer

8. Information for active participants

9. Contacts:

Coordinators:

NIST contact:

Email lists:

Information and discussion for active workshop participants

[email protected]
archive open to active participants only
NIST will subscribe the contact listed in your application to participate when we have received it. Additional members of active participant teams will be subscribed by NIST if they send email to indicating they want to be subscribed, the email address to use, their name, and providing the TRECVID 2008 active participant's password. Groups may combine the information for multiple team members in one email.
Once subscribed, you can post to this list by sending you thoughts as email to [email protected], where they will be sent out to everyone subscribed to the list, i.e., the other active participants.

Information and discussion on the surveillance event detection task/li>
- [email protected]
- If you would like to subscribe, see the event detection webpage for contact information.

National Institute of
Standards and Technology Home

Last updated:
Date created: Tuesday, 3-Dec-07
For further information contact