Video Summarization (VSUM)

Task Coordinators: Keith Curtis and Gareth Jones

An important need in many situations involving video collections (archive video search/reuse, personal video organization/search, movies, tv shows, etc.) is to summarize the video in order to reduce the size and concentrate the amount of high value information in the video track. From 2020 we begin a new video summarization track in TRECVID in which the task is to summarize the major life events of specific characters over a number of weeks of programming on the BBC Eastenders TV series. Typically, three characters will be chosen for this task every year, and summaries of their major life events must be between the selected period of the show, which will be specified to participants in advance of the task.

The use case for this task is to generate an automatic summary, using a predefined maximum number of unique shots, of the significant life events of a given character from the Eastenders series over a given number of episodes. The generated summaries should be enough to gain a clear and concise overview of that characters major life events over the course of 8 - 12 weeks of programming in the series, and to see how they intertwine with the major life events of other specified characters in that time frame of the series.

System Task

Given a collection of BBC Eastenders test videos, a master shot boundary reference, a list of characters from the series, and a time frame of the series for which to use for summarization, summarize the major life events of each character within the specified time frame of the series. Some examples of major life events are more likely to be: The birth of a child rather than a short illness, A divorce rather than an argument with a loved one, the passing of a loved one rather than the passing of someone losely known to you, etc., etc. Summaries are limited to a maximum number of unique shots, thus the main challenge is to select those shots most likely to be considered a major life event by human assessors.

Starting in 2020 there will be a new summarization task:

1- Main Task:

Systems will be asked to submit automatically generated summaries for three specified characters of the Eastenders series:
Time period limited to between 8 and 12 weeks of the series.
Videos of the series which can be used for summarization will be specified.
Maximum number of shots which can be used in summaries will be specified.

Sample Summarization Case: Heather

In the example of summarizing the major life events of Heather, the following is an example of the kind of questions likely to be asked to human assessors as they rate the quality of summaries, followed by an example of the video clips which would answer those questions. Note that the answer does not have to be specifically stated in the videos, just that they can be said to answer those questions.

A - How was Heather taken to the hospital?

B - Why was she taken to the hospital?

C - What name does she give her child?

D - Who is the father of her child?

Data Resources

Dataset
About 244 video files (300 GB, 464 h) of BBC EastEnders video in MPEG-4 format. See here for information on how to get a copy of the test data.

300 GB, 464 h of the BBC Eastenders test data will be available from Dublin City University.
Auxiliary data: Participants are allowed to use various publicly available EastEnders resources as long as they carefully note the use of each such resource by name in their workshop notebook papers. They are strongly encouraged to share information about the existence of such resources with other participants via the active participants mailing list as soon as they discover them.

Topics (Characters to Summarize):
Each topic will consist of a set of 4 example frame images (bmp) drawn from test videos containing the person of interest in a variety of different appearances to the extent possible.
For each frame image (of a target person) there will be a binary mask of the region of interest (ROI), as bounded by a single polygon and the ID from the master shot reference of the shot from which the image example was taken. In creating the masks (in place of a real searcher), we will assume the searcher wants to keep the process simple. So, the ROI may contain non-target pixels, e.g., non-target regions visible through the target or occluding regions. In addition to example images of the person of interest, the shot videos from which the images were taken will also be given as video examples.
Sharing of components:
- Docker image tools for development are available here. Contact the author Robert Manthey if you have questions using them.
- We encourage teams to share development resources with other active participants to expedite system development.

Important Dates:

Please check the TRECVID 2020 schedule for important dates.

Run submission format:

Participants will submit results against BBC Eastenders dataset in each run for all and only the 3 main characters chosen for the summarization task that year, within the time frame specified by NIST.
Each team may submit a maximum of 4 prioritized runs per submission.
Submissions will comprise of the final automatically generated video summary for each topic, in .mp4 format, in addition to the xml container, as below, fully decribing the run submissions.
Video summaries must be named <TEAM_NAME>_<RUN_Number>_<TOPIC>.mp4
For example, team SiriusCyberCo, submitting their second run, for topic Heather, must name their submission: SiriusCyberCo_2_Heather.mp4
Please note: Only submissions which are valid when checked against the supplied DTDs will be accepted. You must check your submission before submitting it. NIST reserves the right to reject any submission which does not parse correctly against the provided DTD(s). Various checkers exist, e.g., Xerces-J: java sax.Counter -v YourSubmision.xml.
Here for download (right click and choose "display page source" to see the entire file) is the DTD for summarization results of one run and a small example of what a site would send to NIST for evaluation. Please check your submission to see that it is well-formed
Please submit each run information in a separate file, named to make clear which team it is from. EACH file you submit should begin, as in the example submission, with the DOCTYPE statement and a videoSummarizationResults element even if only one run is included:
Submissions will be transmitted to NIST using this password-protected webpage

Queries:

The following table specifies this years query characters, the time frame of the series (Start Shot # and End Shot #), links to images of the query characters, and the maximum length and number of shots for each run.
Important: All participating teams should submit 4 runs for each query, using the specified maximum number of shots for each run.

Character	Janine	Ryan	Stacey
Start Shot #	shot175_1	shot175_1	shot175_1
End Shot #	shot185_1736	shot185_1736	shot185_1736
Images	Images	Images	Images
Max # Shots Run 1	5	5	5
Max Summary Length Run 1	150 seconds	150 seconds	150 seconds
Max # Shots Run 2	10	10	10
Max Summary Length Run 2	300 seconds	300 seconds	300 seconds
Max # Shots Run 3	15	15	15
Max Summary Length Run 3	450 seconds	450 seconds	450 seconds
Max # Shots Run 4	20	20	20
Max Summary Length Run 4	600 seconds	600 seconds	600 seconds

Evaluation:

In 2020, all submitted video summaries will be evaluated by assessors at Dublin City University.
A set of questions for each summary will be diseminated to assessors, but not to participants, for evaluation of summary content.
Summaries are also evaluated according to tempo, contextuality, and redundancy of generated video summaries:

Estimate the Tempo and Rhythm of this video summary, on a Likert scale of 1 - 7. High is best.
Estimate the Contextuality provided by this video summary, on a Likert scale of 1 - 7. High is best.
Estimate the level of Redundancy in this video summary, on a Likert scale of 1 - 7. Low is best.

Measures:

Scoring measures for summaries will be calculated from the content based questions and also from the tempo, contextuality, and redundancy based Likert scale estimates described above.

Important notes

The BBC requires all VSUM task participants to fill, sign and submit a renewal data License agreement in order to use the Eastenders data. That means that even if a past participant has a copy of the data, the team must submit a renewal License form before any submission runs can be accepted and evaluated.
No human preknowledge to the closed world of the Eastenders dataset is allowed to be used to filter content. Any filteration methods should all be automatic without fine tuning based on the Eastenders dataset human knowledge.
The usage of the included xml transcripts' files are limited to only the transcripted text and not to any other metadata (or xml) attributes (e.g. color of text, etc).

Open Issues:

BBC Eastenders data License is still being coordinated with the BBC. All active participants will be informed when it is ready in order to submit a signed data agreement and download the data.

Digital Video Retrieval at NIST