Disaster Scene Description and Indexing (DSDI)

Task Coordinators: Jeffrey Liu, William Drew, George Awad, and Asad Butt

Computer vision capabilities have rapidly been advancing and are expected to become an important component to incident and disaster response. However, the majority of computer vision capabilities are not meeting public safety’s needs, such as support for search and rescue, due to the lack of appropriate training data and requirements. For example in 2019, a leading computer vision benchmark has mislabeled a flooded region as a “toilet,” or a highway surrounded by flooding as a “runway.” In response, we’ve developed a dataset of images collected by the Civil Air Patrol of various natural disasters. Two key distinctions are the low altitude, oblique perspective of the imagery and disaster-related features, which are rarely featured in computer vision benchmarks and datasets. This task invites researchers to work on this new domain to develop new capabilities and close the gap in performance to essentially label video clips with the correct disaster-related feature(s).

The pilot for the DSDI task was successfully conducted in TRECVID 2020, and we will continue the task this year.

System Task

To emphasis the unique disaster related features of the dataset, systems will be given a set of natural disaster-related features set, unseen (testing) dataset of short video clips with various disaster-related features collected from real-world disaster events and asked to return for each feature, a ranked list of the top video clips that include the feature. All run submissions should be the output of a fully automatic system. No interactive system submissions are accepted or will be permitted.

Data Resources

The task will be supported by the LADI (Low Altitude Disaster Imagery) dataset for training & development and another pilot testing dataset to test systems. For an introductory video to the LADI dataset developed as part of a larger NIST Public Safety Innovator Accelerator Program (PSIAP) grant, please watch this 5 min youtube video.

Development dataset

The video testing dataset from 2020 and its annotation is available for teams to train their systems.

Dataset Scope

The initial release of the LADI dataset limited was focused on the following set of criteria. The crowd-source Human Annotations were significantly prioritized based on this criteria.

Criteria	Values
Administrative Boundaries	alabama, louisiana, florida, georgia, louisiana, mississippi, north carolina, puerto rico, south carolina, texas, virgin islands, virginia
Months	March - November
Years	2015 - 2019
Altitude (AGL ft)	<= 1000
Image size (MB)	>= 4

Since 2015, each of these locations had a FEMA major disaster declaration for a hurricane or flooding. Three of these (Louisiana, South Carolina, and Texas) also had major declaration for flooding during the Atlantic hurricane season months of June to November. The scope also included four of the five locations with the most images collected from 2015-2018. The other location, but not represented in the scope, was California with its significant wild fire activity.

The lower altitude criteria is intended to further distinguish the LADI dataset from satellite or "top down" datasets and to support development of computer vision capabilities for small drones operating at low altitudes. A minimum image size was selected to maximize the efficiency of the crowd source workers; lower resolution images are harder to annotate. 4 MB was slightly less than the annual file size average.

For more information about LADI, please refer to the github organization.

Testing dataset

A testing dataset of about 5 hours of video will be distributed to participants according to the published schedule. The testing data will be choosen from a recent natural disaster event operational videos and will be similar to the testing dataset used in last year's task. The testing data will be segmented into small video clips of about maximum 60 sec each and a master shot reference table to map the starting and ending of each video clip within the original raw video will be provided to participants to use it in their run submissions. In addition, some videos will have an accompanying KMZ file with the path information, and the start and end locations (latitude and longitude). The test data (link to videos, master shot table, and metadata) can be found here.

Auxiliary Resources

The LADI dataset also includes machine-generated labels from commercial and open-source image recognition tools to provide additional context. MIT LL had run classifiers trained on ImageNet, Places365, and various commercial classifiers. Each image will be tagged with the top 10 labels from each classifier. The open-source image recognition classifiers were generated using the Lincoln Laboratory Supercomputing Center, and the commercial classifiers were run on the respective commercial vendor’s platform. In particular, we used the pretrained implementation of Inception-ResNetV2 trained on the ImageNet dataset in keras, and the pretrained implementation of ResNet50 trained on Places365-Standard in PyTorch. We provide these annotations based on practices associated with the YouTube-8M dataset. Here are a couple examples:

Type	Source	Field	Description
ANNOTATION_MACHINE	Places365	airfield	softmax weight for label class "airfield"
ANNOTATION_MACHINE	Imagenet	tench, Tinca tinca	softmax wieght for label class "tench, Tinca tinca"

Metadata

As part of the datasset, we extract and process the metadata and Exif information from each image. This includes information such as date and time information, latitude and longitude coordinates, and camera settings. The specific Exif data available varies across the images. Here are a few examples:

Type	Source	Field	Description
METADATA	File	fieldpath	path to file location in filesystem
METADATA	File	HDF5	location of file in HDF5
METADATA	File	filesize	size in bytes
METADATA	EXIF	ImageHeight	height of image in pixels
METADATA	EXIF	ImageWidth	width of image in pixels
METADATA	EXIF	GPSLatitude	latitude of image from GPS
METADATA	EXIF	GPSLongitude	longitude of image from GPS

Specifically from 2015-2019, we've observed more then ten camera models and an overwhelming majority of images were collected at low altitudes of 609 meters (2000 feet) and below. Additionally, the largest image size exceeded 20 MB with an average file size of at least 5 MB. The average file size was consistent over the years.

Satellite / Overhead Imagery

Infrastructure

Here are examples of unannotated imagery collected by CAP (Civil Air Patrol) and hosted by FEMA. Note that the lighting, orientation, perspective, and resolution varies across the examples. These variations are a key component to the LADI dataset, as these variations are simply the reality of operational imagery. Any technology to support disaster response will need to handle these variations.

Debris	Flooding	Damage

Testing Labels/Features

As proposed by Heitz and Koller, a dataset can include things or stuff . A thing is something that can be easily and discretely labeled, whereas stuff are less discrete and many have no clear boundaries. For example as of 2019, other dataset such as xView and COCO consist of only things, such as book or yacht. Due to one of our public safety-focused objective, we include both thing and stuff labels. For example, a building is a thing, but there can be an additional label of damage, which is stuff.

Hierarchical Labeling Scheme

We defined a reasonable label set that is feasible for crowd sourcing while meeting public safety's needs. The dataset currently employs a hierarchical labeling scheme of a five coarse categorical and then more specific annotations for each category. The five coarse categories are:

Damage
Environment
Infrastructure
Vehicles
Water

For each of the coarse cateogries, there are 4-9 more specific annotations:

Damage	Environment	Infrastructure	Vehicles	Water
damage (misc)	dirt	bridge	aircraft	flooding
flooding / water damage	grass	building	boat	lake / pond
landslide	lava	dam / levee	car	ocean
road washout	rocks	pipes	truck	puddle
rubble / debris	sand	utility or power lines / electric towers		river / stream
smoke / fire	shrubs	railway
	snow / ice	wireless / radio communication towers
	trees	water tower
		road

Systems are required to use the id associated with the label/feature in the file dsdi.features.txt in their run submissions.

Please see the definitions of each of the above 32 features HERE

Run Training Types:

LADI-based (L) : This should be used if a run has only used the supplied LADI dataset for development of its system.
Non-LADI (N) : This should be used if a run has only used any other training dataset(s) excluding LADI
LADI + Others (O) : This should be used if a run has used LADI dataset in addition to any other dataset(s) for training purposes

Systems that use a well-known pretrained model (such as any variation of ResNet) and train it on the LADI dataset may use the training type L. Only using pretrained weights of an existing model does not warrant the use of training type O. However, if the team also trains the model on another dataset, the training type should be specified as O. Following this, if a team chooses to train their model from scratch, and uses any well-known dataset (along with LADI) to train the model, it should also be denoted as O. The training type N will be used if the LADI dataset is not used to train the system.

Run Submission Format:

Each team may submit a maximum of 4 prioritized runs per training type. The submission formats are described below.

Please note: Only submissions which are valid when checked against the supplied DTDs will be accepted. You must check your submission before submitting it. NIST reserves the right to reject any submission which does not parse correctly against the provided DTD(s). Various checkers exist, e.g., Xerces-J: java sax.Counter -v YourSubmision.xml.

Participants in this version of the task will submit results in each run for all and only the 32 selected features and for each feature at most 1000 shot IDs.
Here for download (right click and choose "display page source" to see the entire files) is a DTD for feature extraction results of one main run, the container for one run, and a small example of what a site would send to NIST for evaluation. Please check all your submissions to see that they are well-formed.
Please submit each of your runs in a separate file, named to make clear which team has produced it. EACH file you submit should begin, as in the example submission, with the DOCTYPE statement that refers to the DTD at NIST via a URL and with a videoFeatureExtractionResults element even though there is only one run is included:
The run submission page accepts uncompressed xml files.
Remember to use the correct shot IDs in your submissions - NIST will make a master shot reference available with the testing data to map each raw video file with it's segmented video clips
Submissions will be transmitted to NIST via a password-protected webpage

Evaluation and Metrics:

Metric	Description
Speed	Clock time per inference, reported in seconds (to one decimal place), must be provided by participants for each run.
Mean Average Precision	Average precision will be calculated for each feature, and the mean average precision reported for each submission.
Recall	True positive, true negative, false positive, and false negative rates.

Open Issues:

Participants are welcome to provide feedback about ideas for heterogenous sensor fusions and how to faciliate collaboration amongst performers. Collaboration is key due to disaster response context and the need to support the public safety community.

Digital Video Retrieval at NIST