Digital Video Test Collection

Carolyn Schmidt & Paul Over

National Institute of Standards and Technology

100 Bureau Drive, Stop 8940

Gaithersburg, MD 20879

{cschmidt, over}@nist.gov

Abstract

By the end of 1999, the National Institute of Standards and Technology (NIST) plans to release the first installment of a public-domain digital video test collection. This first installment contains a series of digitized videos and their transcripts which can be used by researchers in content-based information retrieval (IR) and related fields as a collection in which search, retrieval, and analysis can be performed. This paper provides an introduction to our work in this area, the background and status of the digital video test collection, and information on how to access the collection.

  1. Introduction
  2. NIST's Information Technology Laboratory (ITL), among other things, develops reference data sets and evaluation software, proof-of-concept implementations, tests and test methods. The Information Access Division (IAD) within the ITL and the Natural Language Processing and Information Retrieval (NLPIR) group within IAD, support this mission by encouraging IR research based on large text collections, increasing communication by creating an open forum for exchange of research, expediting technology transfer from the research laboratory into commercial products, and by increasing the availability of IR test collections and evaluation techniques.

    While NLPIR and IR research have historically focused on text, the importance of work in content-based retrieval from non-text media is growing. Our review of recent literature on content-based retrieval from digital video indicates that there is a scarcity of reusable public domain digital video data. Our experience as creators of collections for research in text information retrieval suggest that a public-domain digital video collection of realistic size and complexity could act as a powerful enabler: encouraging more researchers to address real-world problems and supporting the scientific comparison of solutions. We have built the first installment of such a collection and describe that work in what follows.

    Researchers at the University of North Carolina (UNC) have independently identified the same need and begun work on a collection. We believe our digital video collection is a subset of what UNC proposes as "a large video collection for use as a community testbed" and we anticipate future collaboration with them. [Information on UNC's Open Video Project can be found at http://iris.ils.unc.edu/~openvid]

  3. Digital Video Collection
  4. 2.1 Background and Status

    Initial requirements for the collection were formulated by reviewing the literature and soliciting input from several researchers in the field of video storage and retrieval for information. The main questions and our interpretation of what we read and heard are summarized below.

    Who would use such a collection and for what purposes?

    Opinions indicated that a digital video collection would be used by academic, commercial, and government researchers and developers interested in content-based retrieval and browsing of digital video. Application areas would include health care (e.g.,medical education, training, and information); education and lifelong learning (e.g., distance and on-line training); libraries (e.g., archival and record keeping); and government services (e.g., national security and defense), public safety (e.g., surveillance), and entertainment.

    What should be the characteristics (desired and required) of the base data in such a general purpose collection?

    All agreed that data with visual, audio, and text combined would be most useful. One respondent suggested video footage where the visual experience is the primary source of information/entertainment, for example, sports video, aerial surveillance, or satellite footage. Content scenes should include different levels of motion (static to fast moving objects), close-up figures (talking heads), and outdoor and indoor shots (particularly a laboratory or conference room environment). Composition should include short and long shots, and camera motion boundaries (pans and zooms). Additionally, cuts, fades, and dissolves would enhance the usefulness. Some sources noted that audio should be presented with background noise and speech, music and speech, as well as simple narration. Inclusion of closed captioning was recommended. The suggested duration of video varied from 30 second clips to 10-20 minutes in total length. The format (or compression) of the video varied as well. Those mentioned were MPEG-1 (with a space/quality tradeoff), MPEG-2, or MPEG-4. Because of the potential size of such a collection, recommended distribution methods included digital versatile disk (DVD), tape, or web server.

    What reference data might be of use?

    Suggestions for reference data ("truth") varied more than suggestions for the base data itself, reflecting a variety of research interests at various levels of "understanding". One source mentioned having metadata attributes with the ability to preview "postage stamp size" compressed clips. Suggested reference data derived from human intervention included the original script (including scene description and stage directives) of the video; a human description of clips, individual scenes, or the entire video; a catalog describing the content of a larger unit of video (or some form of keyword hierarchy), a transcript of all spoken words, and labeling of scene cuts. Additionally, reference data which identified objects, actions, and/or events was mentioned.

    What sources are you aware of for data of the type you described?

    A number of potential sources were named including government and commercial producers of video content: broadcast/cable networks, government agencies and archives, commercial stock film houses, video from particular projects such as MedSpeak, etc.

    2.2 Test Collection

    Based on the above input, several potential video sources were contacted. We encountered the expected barriers of cost and complicated intellectual property restrictions. While we intend to pursue a number of these sources further, for the initial installment of the collection we chose a federal government source which, although less rich in terms of its characteristics than we had originally planned, was easily accessible and usable: eight closed-caption videos, totaling over two hours in length, selected from NIST's public domain archive of marketing, technical, and educational material. The videos were chosen because they, together, exhibit what we believe to be a useful variety of characteristics in domain, genre, and production techniques. These characteristics include, but are not limited to, different levels of motion (static to fast moving objects), close-up figures (talking heads, moving arms, and moving hands), outdoor and indoor shots (laboratory, auditorium, and conference room environments), and various levels and quality of audio (music and dialog). The videos were produced in English with closed caption text readily available for access, from the time period 1992 to 1998. These professionally edited videos were produced mostly in color, with some black and white.

    Although we believe our digital video collection is a subset of what UNC proposes, our digitial video test collection differs only from UNC's "initial" efforts. UNC's assemblance of video clips from films of the National Archives are mostly in black and white, and use MPEG-1 compression for access via a web browser. While UNC's project plan indicates future growth of their collection and characteristics, we are currently able to provide NIST videos because we obtained the necessary copyright clearances to use each video in its entirety for research purposes. We are also able to provide video transcriptions for use as initial truth data for each video.

    Below is the title, length (hours:minutes:seconds), and brief description of each documentary video:

    NIST in 5 Minutes and 41 Seconds (00:05:41)
    Informational tour of the agency and its efforts to promote economic growth by working with industry to develop and apply technology, measurements, and standards. (1997)
    Enhanced Aerial Lift Controller (00:09:00)
    Describes how the controller may provide solutions to many jobs that cannot be addressed with existing commercial aerial lifts. (1997)
    Portsmouth Flexible Manufacturing Workstation (00:08:15)
    Describes the Portsmouth Fastener Workstation, which makes accurate threaded fasteners for Navy ships. (1992)
    You Don't Have To Be There…Telepresence Microscopy (00:12:30)
    The program shows how telepresence can provide the potential for remote, instantaneous, around-the-clock access to critical metrology services using the Internet. (1998)
    A Decade of Business Excellence for America (00:08:50)
    Highlights the decade of excellence as seen through the Malcolm Baldrige National Quality Award and summarizes its accomplishments. (1998)
    A Uniquely Rewarding Experience (00:07:50)
    Describes the advantages of becoming a Baldrige Quality Award examiner. Testimonials from current and past examiners are featured. (1997)
    Aircraft Hangar Fires: Fire Protection Improvements (00:09:00)
    Describes how NIST and the U.S. Navy conducted tests on sprinkler and heat detection systems in high bay aircraft hangars in Iceland and Hawaii. (1996)
    Engineer in Space (01:14:00)
    Public lecture which describes a NIST engineer’s adventure and research on two missions aboard the space shuttle Columbia. Lecture is followed by ~30 minutes of question and answers. (1998)

    We intend to provide each of the above videos on a DVD-ROM using both MPEG-1 and MPEG-2 compression formats with default parameter settings specified by the encoding software (see Table 1). The SIGIR post-conference workshop on Multimedia Search and Retrieval serves as a forum to acquire feedback and additional encoding needs and requirements before mastering the DVD-ROM for replication.

    Table 1: Proposed Compression Parameters

     

    "Default" MPEG-1

    "Default" MPEG-2

    Parameter name

    Parameter value

    Parameter value

    Total bit rate (Mbps)

    1.4112

    2.3352

    Video bit rate (Mbps)

    1.12

    2

    Audio bit rate (kbps)

    224

    224

    Video resolution

    MPEG-1 SIF

    MPEG-2 Half D-1

    Audio type

    MPEG-1 layer 2

    MPEG-1 layer 2

    Audio sampling rate (kHz)

    44.1

    44.1

    Coding mode

    Stereo

    Stereo

    Drop frame mode

    No

    No

    Scene change detect

    Yes

    --

    3:2 inverse pulldown

    No

    No

    Closed GOP

    No

    No

    Frame sampling

    --

    --

    16:9 aspect ratio

    --

    no

    Intra-picture distance

    15

    15

    Reference picture distance

    3

    3

    GOP size

    1

    --

    VBV buffer size

    40

    177

    Initial VBV

    40960

    181248

    Size (Mb)

    64.5

    106.7

    In addition to the base data, we have included an ASCII version of each video’s transcript, as well as an ASCII version of the closed caption text. No additional reference data has been included in this initial release.

    It is our intent to gather feedback on the use of this collection, the need for additional data, and further requirements for reference data ("truth"). Our goal is potentially to use the Text Retrieval Conference (TREC) as a forum to evaluate digital video information and retrieval. We are soliciting input on your use of this collection, the need for additional data, and proposed requirements for reference data ("truth").

  5. Obtaining and Using the Collection
  6. We expect the cost for the DVD-ROM to be about $50.00. Because the videos were produced using appropriated government funds, no formal permissions to use the video collection are necessary, so long as each video is used in its entirety. The following statement will apply to all purchasers:

    "Videos on this DVD may be reproduced in their entirety as originally released to the public. As released, they are in the public domain. However, video footage in these programs may not be used in other programs without the express written consent of the copyright holders. In addition, portions of the videos that include the sound track may not be combined together to create new programs without securing and paying for music licenses and reuse fees to the narration talent. For more information, contact Ron Meininger, NIST Public and Business Affairs, (301) 975-2761 or email mein@nist.gov."

    The digital video collection and transcripts can be accessed using a DVD drive, with a compatible MPEG video decoder. [Public-domain or free video decoders are available from http://www.mpeg.org/MPEG/video.html]

    Once created, the DVD-ROM will be available from NIST's Standard Reference Data service:

    Standard Reference Data

    National Institute of Standards and Technology

    100 Bureau Drive, Stop 2310

    Gaithersburg, MD USA 20899-2310

    Web: http://www.nist.gov/srd

    Voice: (301) 975-2008

    Email: srdata@nist.gov

    Fax: (301) 926-0416

    For technical information contact:

    Carolyn Schmidt or Paul Over

    100 Bureau Drive, Stop 8940

    Gaithersburg, MD 20899-8940

    Voice: (301) 975-{3243 or 6784}

    Email: {cschmidt, over}@nist.gov

    Fax: (301) 975-5287

    URL: http://www.nist.gov/itl/div894/894.02