In 2007 the TREC Video Retrieval Workshop series will run a 1-day workshop on video summarization of unedited BBC video (rushes) as part of the ACM Multimedia Conference 2007, Augsburg, Germany on Friday, 28. September 2007.
PLEASE NOTE: Use of the BBC data and presentations at the ACM workshop (oral, poster, demo) will be limited to groups who are already active participants in TRECVID 2007 and have completed submissions for the TRECVID video summarization task. But attendance at the ACM workshop and participation in discussions will be open to all who sign up for the ACM workshop.
Compact visual surrogates for videos, ones that give a good indication of what the videos are about, have many potential uses even when generic. The highly redundant nature of rushes and their potential value for reuse and repurposing make them a good target for summarization. But such summarization is difficult and evaluation of summaries, whether intrinsic or extrinsic, is known to be complicated and costly. This ACM workshop is a feasibility test not only of some video summarization approaches but of a particular evaluation framework and aims to set some baselines and encourage progress in both.
Rushes are the raw material (extra video, B-rolls footage) used to produce a video. 20 to 40 times as much material may be shot as actually becomes part of the finished product. Rushes usually have only natural sound. Actors are only sometimes present. So very little if any information is encoded in speech. Rushes contain many frames or sequences of frames that are highly repetitive, e.g., many takes of the same scene redone due to errors (e.g. an actor gets his lines wrong, a plane flies over, etc.), long segments in which the camera is fixed on a given scene or barely moving,etc. A significant part of the material might qualify as stock footage - reusable shots of people, objects, events, locations, etc. Rushes may share some characteristics with "ground reconnaissance" video.
The BBC Archive has provided about 100 hours of unedited material in MPEG-1 from about five dramatic series. Most of the videos have durations of about 30 minutes. Half the videos will be used for systems development and half reserved for system test.
Sample ground truth - lists of important segments identified by major objects/events - will be created by Dublin City University for some development clips and provided with the development data. These will be just examples and not intended as training data. The ground truth for the test data, created by the same process/people will be the basis for the evaluation.
The system task in rushes summarization will be, given a video from the rushes test collection, to automatically create an MPEG-1 summary clip less than or equal to 4% of the original video's duration. This means the average summary will be less than or equal to 60 seconds long. The summary should show the main objects (animate and inanimate) and events in the rushes video to be summarized. The summary should minimize the number of frames used and present the information in ways that maximizes the usability of the summary and speed of objects/event recognition.
Such a summary could be returned with each video found by a video search engine much as text search engines return short lists of keywords (in context) for each document found - to help the searcher (whether professional of recreational) decide whether to explore a given item further without viewing the whole item. It might be input to a larger system for filtering, exploring and managing rushes data.
Although in this pilot task we limit the notion of visual summary to a single clip that will be evaluated using simple play and pause controls, there is still room for creativity in generating the summary. Summaries need not be series of frames taken directly from the video to be summarized and presented in the same order. Summaries can contain picture-in-picture, split screens, and results of other techniques for organizing the summary. Such approaches will raise interesting questions of usability.
Each participating TRECVID group will submit to NIST one MPEG-1 summary clip for each of the test rushes videos, the system time (in seconds) needed to create the summary starting only with the video to be summarized, and the number of frames in the summary clip.
At NIST, all the summary clips for a given video will be viewed in a randomized order by a single human judge. In a timed process, the judge will play, pause, stop, fast forward, rewind the video as needed to determine as quickly as possible which of the objects and events listed in the ground truth for the video to be summarized are present in the summary. The judge may also be asked to assess the usability of the summary. This process will be repeated for each test video.
The TRECVID 2007 Video Summarisation Workshop Program at ACM Multimedia
Friday September 28, 2007. See the ACM Digital Library for the workshop proceedings.
08.30 - 09.20
The TRECVID 2007 BBC Rushes Summarization Evaluation Pilot
Paul Over (NIST), Alan F. Smeaton (DCU), Philip Kelly (DCU)
Richard Wright (BBC Archive)
09.20 - 09.40
Rushes Video Summarization by Object and Event Understanding
Feng Wang and Chong-Wah Ngo.
City University of Hong Kong
09.40 - 10.00
Rushes Summarization by Adaptive Acceleration and Stacking of Shots
Marcin Detyniecki and Christophe Marsala
Université Pierre et Marie Curie-Paris 6
10.00 - 10.30 Break
10.30 - 10.50
Feature Fusion and Redundancy Pruning for Rush Video Summarization
Jim Kleban, Anindya Sarkar, Emily Moxley, Stephen Mangiat, Swapna Joshi, Thomas Kuo and B.S. Manjunath
University of California, Santa Barbara
10.50 - 11.10
Video Summarization Preserving Dynamic Content
Francine Chen, Matthew Cooper and John Adcock
FX Palo Alto Laboratory
11.10 - 11.30
Generating Comprehensible Summaries of Rushes Sequences based on Robust Feature Matching
Ba Tu Truong and Svetha Venkatesh
Curtin University of Technology
11.30 - 11.50
Skimming Rushes Video Using Retake Detection
Werner Bailer, Felix Lee and Georg Thallinger
12.00 - 13.30 Lunch
13.30 - 14.45 Combined Poster & Demo session
Video Summarization at Brno University of Technology
Vítězslav Beran, Michal Hradiš, Adam Herout, Stanislav Sumec, Igor Potúček, Pavel Zemčík, Josef Mlích, Aleš Láník, Petr Chmelař
Brno University of Technology
Clever Clustering vs. Simple Speed-Up for Summarizing BBC Rushes
Alexander G. Hauptmann, Michael G. Christel, Wei-Hao Lin, Bryan Maher, Jun Yang, Robert V. Baron and Guang Xiang Carnegie Mellon University
A User-Centered Approach to Rushes Summarisation Via Highlight-Detected Keyframes
Daragh Byrne, Peter Kehoe, Hyowon Lee, Ciarán O'Connaire, Alan F. Smeaton, Noel E. O'Connor and Gareth J.F. Jones
Dublin City University
Split-Screen Dynamically Accelerated Video Summaries
Emilie Dumont and Bernard Merialdo
Rushes Summarization with Self-Organizing Maps
Markus Koskela, Mats Sjöberg and Jorma Laaksonen
Helsinki University of Technology
National Institute of Informatics, Japan at TRECVID 2007: BBC Rushes Summarization
Duy-Dinh Le and Shin'ichi Satoh
National Institute of Informatics
NTU TRECVID-2007 Fast Rushes Summarization System
Chen-Ming Pan, Yung-Yu Chuang and Winston H. Hsu
National Taiwan University
THU-ICRC at Rush Summarization of TRECVID 2007
Tao Wang, Yue Gao, Jianguo Li, Patricia P. Wang, Xiaofeng Tong, Wei Hu, Yimin Zhang and Jianmin Li
Intel China Research Center and Tsinghua University
The Hong Kong Polytechnic University at TRECVID 2007 BBC Rushes Summarization
Yang Liu, Yan Liu and Yan Zhang
The Hong Kong Polytechnic University
Attention-based Video Summarisation in Rushes Collection
Reede Ren, P.Punitha and Joemon Jose
University of Glasgow
On-line Video Skimming Based on Histogram Similarity
Víctor Valdés and José M. Martínez
Universidad Autónoma de Madrid
14:45 - 15.30 Break
15.30 - 16.30 Plenary discussion on lessons learned, plans for 2008