Social-media video storytelling linking
Introduction
When an event occurs in real-life, there may be little information available at first. However, as pictures and textual descriptions about the event are published in social-media, they form an evolving story, which is of critical value for the affected or interested persons. This evolving story corresponds to a semantic topic that can be tracked over time with both textual and visual information from multiple social-media sources (i.e., end-users or online services). Moreover, as social-media sources continue to publish information about the story, it becomes critical to select the most relevant information. Thus, based on all collaborative audio-visual and textual information, one can create summaries or stories of those real-world events.
The social-media video storytelling linking task seeks to advance the area of visual summarization, with collaborative videos, images and texts available from professional media and social-media users. The challenges in creating such visual timelines are various: alignment of videos and images through sound or timestamps to smooth transitions or remove duplicates, detection of video intervals of high interest, caption generation for a group of pictures, among others.
Task
The goal is to illustrate a news story with social-media content. Starting from a news story topic and a stream of social-media video and images, the goal is to link a story-segment to image and video material, while also preserving a good flow of the whole visual story.
A news story topic is an actual news narrative and the news segments correspond to particular sentences of the news, that a journalist may wish to illustrate. For each story segment (a sentence query with some a strong visual component), systems should retrieve the video and image that satisfy the two requirements:
- Best illustrates the news segment;
- Makes the best transition from the previous video/image illustration.
For a more detailed guidelines please refer to this document.
Working example
This example aims to illustrate a story about street circus at the Edinburgh Festival. Starting from a social-media images and video dataset, and from the description of the story segments, the goal is to find the most suitable illustrations for those segments.
Evaluation will assess (1) the relevance of illustrations and (2) the consistency of the illustrations in terms of the relation between two consecutive illustrations.
Data
To run the Social-Media Visual Storytelling Linking task, we collected media and news stories about two events:
- Edinburgh Festival : Consists of a celebration of the performing arts, gathering dance, opera, music and theatre performers from all over the world. The event takes place in Edinburgh, Scotland and has a duration of 3 weeks in August.
- Le Tour de France : Consists of one of the main road cycling race competitions. The event takes place in France (day 1-8, 11-17, 20-23), Spain (day 9), Andorra (day 9-11), Switzerland (day 17-19), and has a duration of 23 days in July.
The collected data include news stories from verified news sources which will be used as the story topic and segments. Social-media data is collected from Twitter and Flickr and are obtained by a focused crawler to collect event specific images and video. In summary, the data will include:
Twitter images and videos:
- Edinburgh Festival: over 32k images and 6.2k videos;
- Le Tour de France: over 66k images and 19k videos.
Flickr images:
- Edinburgh Festival: over 10k images;
- Le Tour de France: over 11k images.
Please see the LNK task website for data download instructions.
Evaluation
A story summary will consist of a sequence of images and videos related to a news story, and individual illustration must match a story segment in the context of the story. Story illustrations will be assessed in terms of:
Relevance of the segment illustration (blue links in the above figure): The relevance of the illustration to the story segment will be judged as:
- s_i=0: the image or video illustration is not relevant to the story segment.
- s_i=1: the image or video illustration is relevant to the story segment.
Consistency of illustration transitions (red links in the above figure): Each video and image will be judged in terms of its relation to the previous video segment:
- t_i=0: there is no relation between the segment illustrations.
- t_i=1: there is a semantic (and visual) relation between the two segments.
The overall summary quality is given by the expression:

where the function pairwiseQuality(i,i-1), quantifies the perceived quality of two neighbouring illustrations based on the relevance judgments:
pairwiseQuality(i , i-1)=0.4∙s_{i-1} + 0.2∙s_i+0.2∙t_i+0.2 ∙s_{i-1}∙s_i, where s_0=1.
The 30 test topics are available for download at this link.
Submissions
The above figure illustrates what is supposed to be submitted: the video segment links and the video transition links are the key elements that make the story summary. From a submission point of view, teams will have to consider the following:
- There will be a total of 30 story topics, organized into 3-5 visual story segments each.
- Each run is composed of a sequence of videos/images intended to illustrate the sequence of story segments.
- Teams may submit up to 5 runs. Hence, each team can create up to 5 alternative visual summaries for each story.
- Runs are due on August 26 and should be submitted via NIST password protected submission webpage.