An important need in many situations involving video collections (archive video search/reuse, personal video organization/search, movies, tv shows, etc.)
is to summarize the video in order to reduce the size and concentrate the amount of high value information in the video track. In 2022 we begin the Movie Summarization (MSUM) track in TRECVID, replacing the previous Video Summarization (VSUM) track. This track will use a licensed movie dataset from Kinolorberedu, in which the goal is to summarize the storylines and roles of specific characters during a full movie.
The goals for this track are to:
Efficiently capture important facts about certain persons during their role in the movie storyline.
Assess how well video summarization and textual summarization compare in this domain.
Given a movie, a character, and image / video examples of that character, Generate a video summary highlighting major key-fact events about the character (similar to TV20 & TV21 VSUM). Video summaries will be limited by a maximum summary length. See below for further details on what constitutes a key-fact event and for details on annotation and assessment.
Given a movie, a character, and image / video examples of that character, Generate a textual summary to include key-fact events about the character role in the movie. Textual summaries will be limited by a maximum number of sentences and a maximum number of words. See below for further details on what constitutes a key-fact event and for details on annotation and assessment.
Annotation and Assessment
Human annotators will:
Watch each movie
For selected characters, extract key-fact events about them
Video Summary evaluation:
Assessors will watch submitted summaries (subject to max duration)
Systems are rewarded for including the key-fact events
Scoring is based on the percentage of correct key-facts includerd in the summaries
Subjective evaluation will also be conducted (contextuality, redundancy, etc.)
Textual Summary evaluation:
Systems will submit a summary of up to X sentences and Y words
Assessors will read the submitted textual summary and mark correctly retrieved key-facts
Objective evaluation of retrieved key-facts (regardless of any filler sentences)
Subjective evaluation will also be conducted (readability, contextuality, redundancy, etc.)
What is a key-fact event?
Any events that are important and critical in the character storyline.
They should cover his/her role from the start to the end of the movie.
Example : From the example movie “Super Hero” (below) – Character: Jeremy
Charlie bullies Jeremy
Charlie and Jeremy fight at the playground
Jeremy's mother reveals to the principle that Jeremy has a
terminal illness
Jeremy gets admitted to the hospital
Jeremy passes away
Important points
A key-fact event regarding a charcter does not necessarily require
that character to be visible in the scene. In the above example 'Super
Hero', Jeremy's mother revealed to the principle that Jeremy had a
terminal illness. This would clearly count as key-fact regarding Jeremy even
though he was not present in the scene.
The purpose of this task is to summarize the important key-facts
for a character. As such, this is different from a movie trailer. Key
events should appear in the order in which they become apparent in
the movie, and should ideally capture that characters storyline.
The number of allowed key facts is limited per movie and
character. One of the major challenges of the task is to seperate
major key facts from non consequential things. One example could be:
'Daryl broke up with his girlfriend over breakfast' is more
likely to be a major key fact than 'Daryl had eggs and toast
for breakfast'.
Data Resources
Dataset
This track will use a licensed movie dataset from Kinolorberedu. For the current year of the track, 10 full movies will be made available to participating teams.
To access the training and testing dataset from HERE, please submit the
data agreement form to
gawad@nist.gov
Topics (Characters to Summarize):
Each topic will consist of a movie, the character to summarise the key-fact events for, and a set of image/video examples of that character.
For video summaries, a max summary time (in seconds) will be specified for each character. While for text summaires, the max sentences limit
will be specified for each character as well. A sentence for text summary can be either a keyfact (the focus of the task), or a filler sentence.
The max sentences a run can submit for a given character will include all keyfacts and filler sentences.
Sharing of components:
Docker image tools for development are available here.
Contact the author Robert Manthey if you have questions using them.
We encourage teams to share development resources with other active participants to expedite system development.
Participants will submit results against the Kino Lorber movie
dataset in each run for all and only the characters chosen for the
summarization task that year, using the movies specified by NIST.
Teams may submit up to 4 prioritized runs per task submission (1 - 4).
Text Submissions will comprise of the final automatically
generated text summary for each topic, in an xml container, as below, fully decribing the run submissions.
Video Submissions will comprise of the final automatically
generated video summary for each topic, in addition to an xml container, as below, fully decribing the run submissions.
All submitted summaries must be named
<TEAM_NAME>_<MOVIE_NAME>_<RUN_number>_<Text|Video>.xml
or <TEAM_NAME>_<MOVIE_NAME>_<TARGET_NAME>_<RUN_number>_<Video>.mp4
For example, team SiriusCyberCo, submitting their text
summaries for each target character, for the movie SuperHero,
for their first run, must name their submission:
SiriusCyberCo_SuperHero_1_Text.xml. SiriusCyberCo, submitting their video
summaries for target character Jeremy, for the movie SuperHero,
for their second run, must name their submissions:
SiriusCyberCo_SuperHero_2_Video.xml and SiriusCyberCo_SuperHero_Jeremy_2_Video.mp4
Please note: Only submissions which are valid when checked against the supplied DTDs will be accepted. You must check your submission
before submitting it. NIST reserves the right to reject any submission which does not parse correctly against the provided DTD(s). Various
checkers exist, e.g., Xerces-J: java sax.Counter -v YourSubmision.xml.
Here for download (right click and choose "display page source" to see the entire file) is the
DTD for text summarization results of one run and a small example of what a site would send to NIST for evaluation.
Please check your submission to see that it is well-formed
Here for download (right click and choose "display page source" to see the entire file) is the
DTD for summarization results of one run and a
small example of the xml file that a site would send to NIST for evaluation.
Please check your submission to see that it is well-formed
Please submit each run information in a separate file, named to make clear which team it is from. EACH file you submit should begin, as in the example
submission, with the DOCTYPE statement and a
MovieSummarizationTextResults or
MovieSummarizationVideoResults element even if only one run is included:
Queries:
TBD
News magazine, science news, news reports, documentaries, educational programming, and archival video
TV Episodes
Airport Security Cameras & Activity Detection
Video collections from News, Sound & Vision, Internet Archive, Social Media, BBC Eastenders