A number of datasets are available for use in TRECVID 2025 and are described below.
This dataset supports the Ad-hoc video search (AVS) task as training dataset. The V3C1 dataset (drawn from a larger V3C video dataset) is composed of 7475 Vimeo videos (1.3 TB, 1000 h) with Creative Commons licenses and mean duration of 8 min. All videos will have some metadata available e.g., title, keywords, and description in json files. The dataset has been segmented into 1,082,659 short video segments according to the provided master shot boundary files. In addition, Keyframes and thumbnails per video segment have been extracted and available.
Raw V3C1 dataset including metadata will be available for download from servers of ITEC - Institute of Information Technology. While segmented shots from raw videos will be available to download from NIST. Information about downloading the V3C1 from ITEC university can be obtained from the active participants tv25 data servers file
This dataset supports the Ad-hoc video search (AVS) task as testing dataset. The V3C2 dataset (drawn from a larger V3C video dataset) is composed of 9760 Vimeo videos (1.6 TB, 1300 h) with Creative Commons licenses and mean duration of 8 min. All videos will have some metadata available e.g., title, keywords, and description in json files. The dataset has been segmented into 1,425,454 short video segments according to the provided master shot boundary files. In addition, Keyframes and thumbnails per video segment have been extracted and available.
Data use agreements and Distribution: See Data use agreements for download instructions for active participants from NIST/mirror servers and from ITEC university.
Raw V3C2 dataset including metadata is available for download from servers of ITEC - Institute of Information Technology. While segmented shots from raw videos will be available to download from NIST. Information about downloading the V3C2 from ITEC university can be obtained from the active participants tv25 data servers file
Data use agreements and Distribution: See Data use agreements for download instructions for active participants from NIST.
The IACC.3 dataset is approximately 4600 Internet Archive videos (144 GB, 600 h) with Creative Commons licenses in MPEG-4/H.264 format with duration ranging from 6.5 min to 9.5 min and a mean duration of almost 7.8 min. Most videos will have some metadata provided by the donor available e.g., title, keywords, and description.
Data use agreements and Distribution: Download for active participants from NIST/mirror servers. See Data use agreements
Master shot reference, Automatic speech recognition (for English), and ground truth (used between 2016-2017): Available by download from the TRECVID Past Data page
Three datasets (A,B,C) - totaling approximately 7300 Internet Archive videos (144 GB, 600 h) with Creative Commons licenses in MPEG-4/H.264 format with duration ranging from 10 s to 6.4 min and a mean duration of almost 5 min. Most videos will have some metadata provided by the donor available e.g., title, keywords, and description.
NOTE: Be sure to reload the relevant collection.xml files (A, B, C) in the master shot reference and remove files with a "use" attribute set to "dropped" - these are no longer available under a Creative Commons license and are not part of the test collection.
Data use agreements and Distribution: Download for active participants from NIST/mirror servers. See Data use agreements
Master shot reference, Automatic speech recognition (for English), and ground truth (used between 2013-2015): Available by download from the TRECVID Past Data page
Three datasets (A,B,C) - totaling approximately 8000 Internet Archive videos (160 GB, 600 h) with Creative Commons licenses in MPEG-4/H.264 format with duration between 10s and 3.5 min. Most videos will have some metadata provided by the donor available e.g., title, keywords, and description
Data use agreements and Distribution: Available by download from the Internet Archive. See TRECVID Past Data page. Or download from the copy on the Dublin City University server, but use the collection.xml files (see TRECVID past data page) for instructions on how to check the current availability of each file.
Master shot reference, Automatic speech recognition (for English), and ground truth (used between 2010-2012): Available by download from the TRECVID Past Data page
Approximately 3200 Internet Archive videos (50 GB, 200 h) with Creative Commons licenses in MPEG-4/H.264 format with durations between 3.6 and 4.1 min Most videos will have some metadata provided by the donor available e.g., title, keywords, and description
Data use agreements and Distribution: Available by download from the Internet Archive. See TRECVID Past Data page. Or download from the copy (see tv2010 directory) on the Dublin City University server, but use the collection.xml files (see TRECVID past data page) for instructions on how to check the current availability of each file.
Master shot reference: Available by download from the TRECVID Past Data page
Common feature annotation: Available by download from the TRECVID Past Data page
Automatic speech recognition (for English): Available by download from the TRECVID Past Data page
In order to be eligible to receive the data, you must have applied for participation in TREC/TRECVID. Your application will be acknowledged by NIST with a team ID, active participant's password, and information about how to obtain the data.
Note that if you signed the permission form last year for IACC.2, IACC.3, V3C1, or V3C2 and do not need to replace your original copy then you do not need to submit another permission form this year.
In your email include the following:
As Subject: "TRECVID data request" In the body: your name your short team ID (given when you applied to participate) the kinds of data you will be using - one or more of the following: IACC.2, IACC.3, V3C1, V3C2, etcYou will receive instructions on how to download the data.
Requests are handled in the order they are received. Please allow 5 business days for NIST to respond to your request. To download the IACC, or V3C data you need to use the access codes sent to you by email and the information about data servers urls in the active participant's area.