Video Summarization (VSUM)

Task Coordinators: Keith Curtis and Gareth Jones

An important need in many situations involving video collections (archive video search/reuse, personal video organization/search, movies, tv shows, etc.) is to summarize the video in order to reduce the size and concentrate the amount of high value information in the video track. In 2021 we continue the video summarization track in TRECVID in which the task is to summarize the major life events of specific characters over a number of weeks of programming on the BBC Eastenders TV series. Typically, five characters will be chosen for this task every year, and summaries of their major life events must be between the selected period of the show, which will be specified to participants in advance of the task.

The use case for this task is to generate an automatic summary, using a predefined maximum number of unique shots, of the significant life events of a given character from the Eastenders series over a given number of episodes. The generated summaries should be enough to gain a clear and concise overview of that characters major life events over the course of 8 - 12 weeks of programming in the series, and to see how they intertwine with the major life events of other specified characters in that time frame of the series.

System Task

Given a collection of BBC Eastenders test videos, a master shot boundary reference, a list of characters from the series, and a time frame of the series for which to use for summarization, summarize the major life events of each character within the specified time frame of the series. Some examples of major life events are more likely to be: The birth of a child rather than a short illness, A divorce rather than an argument with a loved one, the passing of a loved one rather than the passing of someone losely known to you, etc., etc. Summaries are limited to a maximum number of unique shots, thus the main challenge is to select those shots most likely to be considered a major life event by human assessors.

In 2021 we will continue the video summarization task started in 2020:

1- Main Task: Questions Unknown

Systems will be asked to submit automatically generated summaries for five specified characters of the Eastenders series:
Time period limited to between 8 and 12 weeks of the series.
Videos of the series which can be used for summarization will be specified.
Maximum number of shots which can be used in summaries will be specified.
Ground truth from the 2020 task will be made available for training systems in 2021: here.
For this main task the 5 content questions will not be known in advance.

2- Subtask: Questions Known

This subtask will be run as with the main task, except that content quetions will be made known to teams in advance.
Submission dates for this subtask will be later than the main task, and questions will only be made known to teams once the task submision deadline for the main task has passed.

Sample Summarization Case: Heather

In the example of summarizing the major life events of Heather, the following is an example of the kind of questions likely to be asked to human assessors as they rate the quality of summaries, followed by an example of the video clips which would answer those questions. Note that the answer does not have to be specifically stated in the videos, just that they can be said to answer those questions.

A - How was Heather taken to the hospital?

B - Why was she taken to the hospital?

C - What name does she give her child?

D - Who is the father of her child?

Data Resources

Dataset
About 244 video files (300 GB, 464 h) of BBC EastEnders video in MPEG-4 format. See here for information on how to get a copy of the test data.

300 GB, 464 h of the BBC Eastenders test data will be available from Dublin City University.
Auxiliary data: Participants are allowed and encouraged to use various publicly available EastEnders resources as long as they carefully note the use of each such resource by name in their workshop notebook papers. They are strongly encouraged to share information about the existence of such resources with other participants via the active participants mailing list as soon as they discover them.

Topics (Characters to Summarize):
Each topic will consist of a set of 4 example frame images (bmp) drawn from test videos containing the person of interest in a variety of different appearances to the extent possible.
For each frame image (of a target person) there will be a binary mask of the region of interest (ROI), as bounded by a single polygon and the ID from the master shot reference of the shot from which the image example was taken. In creating the masks (in place of a real searcher), we will assume the searcher wants to keep the process simple. So, the ROI may contain non-target pixels, e.g., non-target regions visible through the target or occluding regions. In addition to example images of the person of interest, the shot videos from which the images were taken will also be given as video examples.
Sharing of components:
- Docker image tools for development are available here. Contact the author Robert Manthey if you have questions using them.
- We encourage teams to share development resources with other active participants to expedite system development.

Important Dates:

Please check the TRECVID 2021 schedule for important dates.

Run submission format:

Participants will submit results against BBC Eastenders dataset in each run for all and only the 5 main characters chosen for the summarization task that year, within the time frame specified by NIST.
Each team is asked to submit 4 prioritized runs per task submission.
Submissions will comprise of the final automatically generated video summary for each topic, in .mp4 format, in addition to the xml container, as below, fully decribing the run submissions.
Video summaries must be named <TEAM_NAME>_<TASK_Number>_<RUN_Number>_<TOPIC>.mp4
For example, team SiriusCyberCo, submitting their second run, on the main task of unknown questions, for topic Heather, must name their submission: SiriusCyberCo_1_2_Heather.mp4
SiriusCyberCo, submitting their fourth run, on the subtask of known questions, for topic Heather, must name their submission: SiriusCyberCo_2_4_Heather.mp4
Please note: Only submissions which are valid when checked against the supplied DTDs will be accepted. You must check your submission before submitting it. NIST reserves the right to reject any submission which does not parse correctly against the provided DTD(s). Various checkers exist, e.g., Xerces-J: java sax.Counter -v YourSubmision.xml.
Here for download (right click and choose "display page source" to see the entire file) is the DTD for summarization results of one run and a small example of what a site would send to NIST for evaluation. Please check your submission to see that it is well-formed
Please submit each run information in a separate file, named to make clear which team it is from. EACH file you submit should begin, as in the example submission, with the DOCTYPE statement and a videoSummarizationResults element even if only one run is included:
Submissions will be transmitted to NIST using this password-protected webpage
VSUM java Run Checker can be found at VSUM Active Directory. Please check files before submission.

Queries:

The following table specifies this years query characters, the time frame of the series (Start Shot # and End Shot #), links to images of the query characters, and the maximum length and number of shots for each run.
Important: All participating teams should submit 4 runs for each query, using the specified maximum number of shots for each run.

Character	Max	Jack	Tanya	Peggy	Archie
Start Shot #	shot60_1	shot60_1	shot60_1	shot79_1	shot79_1
End Shot #	shot70_2040	shot70_2040	shot70_2040	shot89_2036	shot89_2036
Images	Images	Images	Images	Images	Images
Max # Shots Run 1	5	5	5	5	5
Max Summary Length Run 1	50 seconds	50 seconds	50 seconds	50 seconds	50 seconds
Max # Shots Run 2	10	10	10	10	10
Max Summary Length Run 2	100 seconds	100 seconds	100 seconds	100 seconds	100 seconds
Max # Shots Run 3	15	15	15	15	15
Max Summary Length Run 3	150 seconds	150 seconds	150 seconds	150 seconds	150 seconds
Max # Shots Run 4	20	20	20	20	20
Max Summary Length Run 4	200 seconds	200 seconds	200 seconds	200 seconds	200 seconds

Sub Task - Questions

Jack:

What happens when police break in the door of Jack and Tanya's home?
Where are Max and Jack during the voilent confrontation between them when a gun is drawn?
Who does Jack offer to pay in order to withdraw their statement to the police?
Why is Jack a suspect in the hit and run on Max?
What does Jack reveal to Tanya about his dodgy past?

Max:

What were the cause of Max's serious injuries which left him in hospital?
What is/was the relationship between Max and Tanya?
What kind of weapon does Max obtain from Phil?
Where are Max and Jack during the voilent confrontation between them when a gun is drawn?
Who is responsible, or who does Max believe is responsible, for the serious injuries which left him in hospital?

Tanya:

What does Tanya reveal to the police while being interviewed at the station?
What is/was the relationship between Max and Tanya?
What does Jack reveal to Tanya about his dodgy past?
What does Tanya discover in the sink and on Jack's clothes?
What big move were Tanya and Jack planning for the future?

Archie:

What happens when Phil throws Archie in to a pit?
What happens after Danielle reveals to Archie that Ronnie is her mother?
Where do Peggy and Archie get married?
What happens when Archie arrives at the pub after Peggy invited him?
What happens when Archie is kidnapped?

Peggy:

Who does Peggy ask to kill Archie?
Where do Peggy and Archie get married?
Show one of the challenges which Peggy faces in her election run.
What does Peggy overhear Archie saying, which causes their marriage to be over?
What is Janine doing to irritate or anger Peggy?

Evaluation:

In 2021, all submitted video summaries will be evaluated by assessors at Dublin City University.
A set of questions for each summary will be diseminated to assessors, but not to participants, for evaluation of summary content.
Summaries are also evaluated according to tempo, contextuality, and redundancy of generated video summaries:

Estimate the Tempo and Rhythm of this video summary, on a Likert scale of 1 - 7. High is best.
Tempo/Rhythm Defined as: How well do the video shots flow together? Do shots cut mid-sentence (indicating poor tempo/rhythm)? Do they flow together nicely so it wouldn't be obvious that this is an automatically generated summary (high tempo/rhythm)?
Estimate the Contextuality provided by this video summary, on a Likert scale of 1 - 7. High is best.
Contextuality Defined as: Does the content provide the circumstances that form the setting for an event, statement, or idea, and in terms of which it can be fully understood and assessed? (High is best)
Estimate the level of Redundancy in this video summary, on a Likert scale of 1 - 7. Low is best.
Redundancy Defined as: Does the video contain content considered to be unnecessary or superfluous? (Low is best)

Measures:

Scoring measures for summaries will be calculated from the content based questions and also from the tempo, contextuality, and redundancy based Likert scale estimates described above.

Important notes

The BBC requires all VSUM task participants to fill, sign and submit a renewal data License agreement in order to use the Eastenders data. That means that even if a past participant has a copy of the data, the team must submit a renewal License form before any submission runs can be accepted and evaluated.
No human preknowledge to the closed world of the Eastenders dataset is allowed to be used to filter content. Any filteration methods should all be automatic without fine tuning based on the Eastenders dataset human knowledge.
The usage of the included xml transcripts' files are limited to only the transcripted text and not to any other metadata (or xml) attributes (e.g. color of text, etc).