TRECVID 2020 Video Data Schedule Active Participants Attending TRECVID Workshop Contacts

Video Summarization (VSUM)

Task Coordinators: Keith Curtis and Gareth Jones

An important need in many situations involving video collections (archive video search/reuse, personal video organization/search, movies, tv shows, etc.) is to summarize the video in order to reduce the size and concentrate the amount of high value information in the video track. In 2021 we continue the video summarization track in TRECVID in which the task is to summarize the major life events of specific characters over a number of weeks of programming on the BBC Eastenders TV series. Typically, five characters will be chosen for this task every year, and summaries of their major life events must be between the selected period of the show, which will be specified to participants in advance of the task.

The use case for this task is to generate an automatic summary, using a predefined maximum number of unique shots, of the significant life events of a given character from the Eastenders series over a given number of episodes. The generated summaries should be enough to gain a clear and concise overview of that characters major life events over the course of 8 - 12 weeks of programming in the series, and to see how they intertwine with the major life events of other specified characters in that time frame of the series.

System Task

Given a collection of BBC Eastenders test videos, a master shot boundary reference, a list of characters from the series, and a time frame of the series for which to use for summarization, summarize the major life events of each character within the specified time frame of the series. Some examples of major life events are more likely to be: The birth of a child rather than a short illness, A divorce rather than an argument with a loved one, the passing of a loved one rather than the passing of someone losely known to you, etc., etc. Summaries are limited to a maximum number of unique shots, thus the main challenge is to select those shots most likely to be considered a major life event by human assessors.

In 2021 we will continue the video summarization task started in 2020:

1- Main Task: Questions Unknown

Systems will be asked to submit automatically generated summaries for five specified characters of the Eastenders series:
Time period limited to between 8 and 12 weeks of the series.
Videos of the series which can be used for summarization will be specified.
Maximum number of shots which can be used in summaries will be specified.
Ground truth from the 2020 task will be made available for training systems in 2021.
For this main task the 5 content questions will not be known in advance.

2- Subtask: Questions Known

This subtask will be run as with the main task, except that content quetions will be made known to teams in advance.
Submission dates for this subtask will be later than the main task, and questions will only be made known to teams once the task submision deadline for the main task has passed.

Sample Summarization Case: Heather

    In the example of summarizing the major life events of Heather, the following is an example of the kind of questions likely to be asked to human assessors as they rate the quality of summaries, followed by an example of the video clips which would answer those questions. Note that the answer does not have to be specifically stated in the videos, just that they can be said to answer those questions.

  • A - How was Heather taken to the hospital?
  • B - Why was she taken to the hospital?
  • C - What name does she give her child?
  • D - Who is the father of her child?

Data Resources

    • 300 GB, 464 h of the BBC Eastenders test data will be available from Dublin City University.

    • Auxiliary data: Participants are allowed and encouraged to use various publicly available EastEnders resources as long as they carefully note the use of each such resource by name in their workshop notebook papers. They are strongly encouraged to share information about the existence of such resources with other participants via the active participants mailing list as soon as they discover them.

  • Topics (Characters to Summarize):

    Each topic will consist of a set of 4 example frame images (bmp) drawn from test videos containing the person of interest in a variety of different appearances to the extent possible.

    For each frame image (of a target person) there will be a binary mask of the region of interest (ROI), as bounded by a single polygon and the ID from the master shot reference of the shot from which the image example was taken. In creating the masks (in place of a real searcher), we will assume the searcher wants to keep the process simple. So, the ROI may contain non-target pixels, e.g., non-target regions visible through the target or occluding regions. In addition to example images of the person of interest, the shot videos from which the images were taken will also be given as video examples.

  • Sharing of components:

    • Docker image tools for development are available here. Contact the author Robert Manthey if you have questions using them.
    • We encourage teams to share development resources with other active participants to expedite system development.

Important Dates:

Run submission format:

  • Participants will submit results against BBC Eastenders dataset in each run for all and only the 5 main characters chosen for the summarization task that year, within the time frame specified by NIST.
  • Each team is asked to submit 4 prioritized runs per task submission.
  • Submissions will comprise of the final automatically generated video summary for each topic, in .mp4 format, in addition to the xml container, as below, fully decribing the run submissions.
  • Video summaries must be named <TEAM_NAME>_<TASK_Number>_<RUN_Number>_<TOPIC>.mp4
  • For example, team SiriusCyberCo, submitting their second run, on the main task of unknown questions, for topic Heather, must name their submission: SiriusCyberCo_1_2_Heather.mp4
    SiriusCyberCo, submitting their fourth run, on the subtask of known questions, for topic Heather, must name their submission: SiriusCyberCo_2_4_Heather.mp4
  • Please note: Only submissions which are valid when checked against the supplied DTDs will be accepted. You must check your submission before submitting it. NIST reserves the right to reject any submission which does not parse correctly against the provided DTD(s). Various checkers exist, e.g., Xerces-J: java sax.Counter -v YourSubmision.xml.
  • Here for download (right click and choose "display page source" to see the entire file) is the DTD for summarization results of one run and a small example of what a site would send to NIST for evaluation. Please check your submission to see that it is well-formed
  • Please submit each run information in a separate file, named to make clear which team it is from. EACH file you submit should begin, as in the example submission, with the DOCTYPE statement and a videoSummarizationResults element even if only one run is included: <!DOCTYPE videoSummarizationResults SYSTEM "">


  • In 2021, all submitted video summaries will be evaluated by assessors at Dublin City University.
  • A set of questions for each summary will be diseminated to assessors, but not to participants, for evaluation of summary content.
  • Summaries are also evaluated according to tempo, contextuality, and redundancy of generated video summaries:
    • Estimate the Tempo and Rhythm of this video summary, on a Likert scale of 1 - 7. High is best.
      Tempo/Rhythm Defined as: How well do the video shots flow together? Do shots cut mid-sentence (indicating poor tempo/rhythm)? Do they flow together nicely so it wouldn't be obvious that this is an automatically generated summary (high tempo/rhythm)?
    • Estimate the Contextuality provided by this video summary, on a Likert scale of 1 - 7. High is best.
      Contextuality Defined as: Does the content provide the circumstances that form the setting for an event, statement, or idea, and in terms of which it can be fully understood and assessed? (High is best)
    • Estimate the level of Redundancy in this video summary, on a Likert scale of 1 - 7. Low is best.
      Redundancy Defined as: Does the video contain content considered to be unnecessary or superfluous? (Low is best)


  • Scoring measures for summaries will be calculated from the content based questions and also from the tempo, contextuality, and redundancy based Likert scale estimates described above.

Important notes

  • The BBC requires all VSUM task participants to fill, sign and submit a renewal data License agreement in order to use the Eastenders data. That means that even if a past participant has a copy of the data, the team must submit a renewal License form before any submission runs can be accepted and evaluated.
  • No human preknowledge to the closed world of the Eastenders dataset is allowed to be used to filter content. Any filteration methods should all be automatic without fine tuning based on the Eastenders dataset human knowledge.
  • The usage of the included xml transcripts' files are limited to only the transcripted text and not to any other metadata (or xml) attributes (e.g. color of text, etc).

Open Issues:

  • BBC Eastenders data License is still being coordinated with the BBC. All active participants will be informed when it is ready in order to submit a signed data agreement and download the data. We don't anticipate delays in 2021 regarding data distribution

Digital Video Retrieval at NIST

Digital Video Retrieval at NIST
News magazine, science news, news reports, documentaries, educational programming, and archival video

Digital Video Retrieval at NIST
TV Episodes

Digital Video Retrieval at NIST
Airport Security Cameras & Activity Detection

Digital Video Retrieval at NIST
Video collections from News, Sound & Vision, Internet Archive,
Social Media, BBC Eastenders