TRECVID 2021 Video Data Schedule Active Participants Attending TRECVID Workshop Contacts

Disaster Scene Description and Indexing (DSDI)

Task Coordinators: Jeffrey Liu, William Drew, George Awad, and Asad Butt

Computer vision capabilities have rapidly been advancing and are expected to become an important component to incident and disaster response. However, the majority of computer vision capabilities are not meeting public safety’s needs, such as support for search and rescue, due to the lack of appropriate training data and requirements. For example in 2019, a leading computer vision benchmark has mislabeled a flooded region as a “toilet,” or a highway surrounded by flooding as a “runway.” In response, we’ve developed a dataset of images collected by the Civil Air Patrol of various natural disasters. Two key distinctions are the low altitude, oblique perspective of the imagery and disaster-related features, which are rarely featured in computer vision benchmarks and datasets. This task invites researchers to work on this new domain to develop new capabilities and close the gap in performance to essentially label video clips with the correct disaster-related feature(s).

The pilot for the DSDI task was successfully conducted in TRECVID 2020, and we will continue the task this year.

System Task

To emphasis the unique disaster related features of the dataset, systems will be given a set of natural disaster-related features set, unseen (testing) dataset of short video clips with various disaster-related features collected from real-world disaster events and asked to return for each feature, a ranked list of the top video clips that include the feature. All run submissions should be the output of a fully automatic system. No interactive system submissions are accepted or will be permitted.

Data Resources

The task will be supported by the LADI (Low Altitude Disaster Imagery) dataset for training & development and another pilot testing dataset to test systems. For an introductory video to the LADI dataset developed as part of a larger NIST Public Safety Innovator Accelerator Program (PSIAP) grant, please watch this 5 min youtube video.

  • Development dataset

  • Development dataset based on the LADI dataset hosted as part of the AWS Public Dataset program will be available to participants. It consists of over 20,000+ annotated images, each at least 4 MB in size.

    The video testing dataset from 2020 and its annotation is available for teams to train their systems.

    Dataset Scope

    The initial release of the LADI dataset limited was focused on the following set of criteria. The crowd-source Human Annotations were significantly prioritized based on this criteria.

    Criteria Values
    Administrative Boundaries alabama, louisiana, florida, georgia, louisiana, mississippi, north carolina, puerto rico, south carolina, texas, virgin islands, virginia
    Months March - November
    Years 2015 - 2019
    Altitude (AGL ft) <= 1000
    Image size (MB) >= 4

    Since 2015, each of these locations had a FEMA major disaster declaration for a hurricane or flooding. Three of these (Louisiana, South Carolina, and Texas) also had major declaration for flooding during the Atlantic hurricane season months of June to November. The scope also included four of the five locations with the most images collected from 2015-2018. The other location, but not represented in the scope, was California with its significant wild fire activity.

    The lower altitude criteria is intended to further distinguish the LADI dataset from satellite or "top down" datasets and to support development of computer vision capabilities for small drones operating at low altitudes. A minimum image size was selected to maximize the efficiency of the crowd source workers; lower resolution images are harder to annotate. 4 MB was slightly less than the annual file size average.

    For more information about LADI, please refer to the github organization.

  • Testing dataset

  • A testing dataset of about 5 hours of video will be distributed to participants according to the published schedule. The testing data will be choosen from a recent natural disaster event operational videos and will be similar to the testing dataset used in last year's task. The testing data will be segmented into small video clips of about maximum 60 sec each and a master shot reference table to map the starting and ending of each video clip within the original raw video will be provided to participants to use it in their run submissions. In addition, some videos will have an accompanying KMZ file with the path information, and the start and end locations (latitude and longitude). The test data (link to videos, master shot table, and metadata) can be found here.

  • Auxiliary Resources

  • The LADI dataset also includes machine-generated labels from commercial and open-source image recognition tools to provide additional context. MIT LL had run classifiers trained on ImageNet, Places365, and various commercial classifiers. Each image will be tagged with the top 10 labels from each classifier. The open-source image recognition classifiers were generated using the Lincoln Laboratory Supercomputing Center, and the commercial classifiers were run on the respective commercial vendor’s platform. In particular, we used the pretrained implementation of Inception-ResNetV2 trained on the ImageNet dataset in keras, and the pretrained implementation of ResNet50 trained on Places365-Standard in PyTorch. We provide these annotations based on practices associated with the YouTube-8M dataset. Here are a couple examples:

    Type Source Field Description
    ANNOTATION_MACHINE Places365 airfield softmax weight for label class "airfield"
    ANNOTATION_MACHINE Imagenet tench, Tinca tinca softmax wieght for label class "tench, Tinca tinca"
    Metadata

    As part of the datasset, we extract and process the metadata and Exif information from each image. This includes information such as date and time information, latitude and longitude coordinates, and camera settings. The specific Exif data available varies across the images. Here are a few examples:

    Type Source Field Description
    METADATA File fieldpath path to file location in filesystem
    METADATA File HDF5 location of file in HDF5
    METADATA File filesize size in bytes
    METADATA EXIF ImageHeight height of image in pixels
    METADATA EXIF ImageWidth width of image in pixels
    METADATA EXIF GPSLatitude latitude of image from GPS
    METADATA EXIF GPSLongitude longitude of image from GPS

    Specifically from 2015-2019, we've observed more then ten camera models and an overwhelming majority of images were collected at low altitudes of 609 meters (2000 feet) and below. Additionally, the largest image size exceeded 20 MB with an average file size of at least 5 MB. The average file size was consistent over the years.

    Satellite / Overhead Imagery
    Infrastructure

    Here are examples of unannotated imagery collected by CAP (Civil Air Patrol) and hosted by FEMA. Note that the lighting, orientation, perspective, and resolution varies across the examples. These variations are a key component to the LADI dataset, as these variations are simply the reality of operational imagery. Any technology to support disaster response will need to handle these variations.

    Debris Flooding Damage

Testing Labels/Features

As proposed by Heitz and Koller, a dataset can include things or stuff . A thing is something that can be easily and discretely labeled, whereas stuff are less discrete and many have no clear boundaries. For example as of 2019, other dataset such as xView and COCO consist of only things, such as book or yacht. Due to one of our public safety-focused objective, we include both thing and stuff labels. For example, a building is a thing, but there can be an additional label of damage, which is stuff.

    Hierarchical Labeling Scheme

    We defined a reasonable label set that is feasible for crowd sourcing while meeting public safety's needs. The dataset currently employs a hierarchical labeling scheme of a five coarse categorical and then more specific annotations for each category. The five coarse categories are:

    • Damage
    • Environment
    • Infrastructure
    • Vehicles
    • Water

    For each of the coarse cateogries, there are 4-9 more specific annotations:

    Damage Environment Infrastructure Vehicles Water
    damage (misc) dirt bridge aircraft flooding
    flooding / water damage grass building boat lake / pond
    landslide lava dam / levee car ocean
    road washout rocks pipes truck puddle
    rubble / debris sand utility or power lines / electric towers river / stream
    smoke / fire shrubs railway
    snow / ice wireless / radio communication towers
    trees water tower
    road

    Systems are required to use the id associated with the label/feature in the file dsdi.features.txt in their run submissions.

    Please see the definitions of each of the above 32 features HERE

Run Training Types:

    Each submitted run must declare the type of training/development data used to produce the results in this specific run file. Three main training types are allowed:
  • LADI-based (L) : This should be used if a run has only used the supplied LADI dataset for development of its system.
  • Non-LADI (N) : This should be used if a run has only used any other training dataset(s) excluding LADI
  • LADI + Others (O) : This should be used if a run has used LADI dataset in addition to any other dataset(s) for training purposes
Systems that use a well-known pretrained model (such as any variation of ResNet) and train it on the LADI dataset may use the training type L. Only using pretrained weights of an existing model does not warrant the use of training type O. However, if the team also trains the model on another dataset, the training type should be specified as O. Following this, if a team chooses to train their model from scratch, and uses any well-known dataset (along with LADI) to train the model, it should also be denoted as O. The training type N will be used if the LADI dataset is not used to train the system.

Run Submission Format:

    Each team may submit a maximum of 4 prioritized runs per training type. The submission formats are described below.

    Please note: Only submissions which are valid when checked against the supplied DTDs will be accepted. You must check your submission before submitting it. NIST reserves the right to reject any submission which does not parse correctly against the provided DTD(s). Various checkers exist, e.g., Xerces-J: java sax.Counter -v YourSubmision.xml.

    • Participants in this version of the task will submit results in each run for all and only the 32 selected features and for each feature at most 1000 shot IDs.
    • Here for download (right click and choose "display page source" to see the entire files) is a DTD for feature extraction results of one main run, the container for one run, and a small example of what a site would send to NIST for evaluation. Please check all your submissions to see that they are well-formed.
    • Please submit each of your runs in a separate file, named to make clear which team has produced it. EACH file you submit should begin, as in the example submission, with the DOCTYPE statement that refers to the DTD at NIST via a URL and with a videoFeatureExtractionResults element even though there is only one run is included: <!DOCTYPE videoFeatureExtractionResults SYSTEM "https://www-nlpir.nist.gov/projects/tv2021/dtds/videoFeatureExtractionResults.dtd">
    • The run submission page accepts uncompressed xml files.
    • Remember to use the correct shot IDs in your submissions - NIST will make a master shot reference available with the testing data to map each raw video file with it's segmented video clips
    • Submissions will be transmitted to NIST via a password-protected webpage

Evaluation and Metrics:

Metric Description
Speed Clock time per inference, reported in seconds (to one decimal place), must be provided by participants for each run.
Mean Average Precision Average precision will be calculated for each feature, and the mean average precision reported for each submission.
Recall True positive, true negative, false positive, and false negative rates.

Open Issues:

  • Participants are welcome to provide feedback about ideas for heterogenous sensor fusions and how to faciliate collaboration amongst performers. Collaboration is key due to disaster response context and the need to support the public safety community.
  • Digital Video Retrieval at NIST

    Digital Video Retrieval at NIST
    News magazine, science news, news reports, documentaries, educational programming, and archival video

    Digital Video Retrieval at NIST
    TV Episodes

    Digital Video Retrieval at NIST
    Airport Security Cameras & Activity Detection

    Digital Video Retrieval at NIST
    Video collections from News, Sound & Vision, Internet Archive,
    Social Media, BBC Eastenders