Ad-hoc Video Search (AVS)
The Ad-hoc search task goal is to model the end user search use-case, who is searching (using textual sentence queries) for segments of video containing persons, objects, activities, locations, etc. and combinations of the former.
 While the Internet Archive (IACC.3) dataset was adopted between 2016 to 2018, from 2019 to 2021 a new data collection (V3C1) based on Vimeo
 Creative Commons (V3C) datset was adopted. Starting in 2022 the task started to utilize a new sub-collection V3C2 to test systems on a new
 set of queries in addition to common (fixed) progress query set to measure system progress from 2022 to 2024.
  In 2025, the AVS track will re-run again the 2024 main queries. This will allow to measure system progress across two years (2024-2025).
  The ground truth for the 2024 main queries has not been released and thus allows for re-scoring new runs. 
System Task
Given the video test collection 
(V3C2), master shot boundary reference, and
set of textual Ad-hoc queries (approx. 20 queries) released by NIST, return for each query a list
of at most 1000 shot IDs from the test collection ranked according to their likelihood of
containing the target query.
  
Data Resources
  
  
  
  
    
    - Sharing of components:-  Previous and current teams are encouraged to share system components (concept scores, model outputs, codebase resources, etc) with other teams to facilitate new team's experience and get better
	      insights into system behaviour through broader experiments.
	   
Participation types
There will be 3 types of participation:
- Participation by submitting only automatic runs for TRECVID evaluation
- Participation by submitting only automatic runs for TRECVID evaluation and by using an interactive system during the next VBS (Video Browser Showdown)
- Participation by using only interactive system during the next VBS (Video Browser Showdown)
For questions about participation in the next VBS please contact the VBS organizers: Werner Bailer,
Cathal Gurrin, or Klaus Schoeffmann.
 
Allowed training categories:
 Four training types are allowed in general. Each run must declare which training type it used to generate it's results.
  - A - used only V3C1 data and annotation strict.
  
- D - used any other training data with any annotation (except the testing data V3C2).
  
- E - used only training data collected automatically using only the official query textual description.
  
- F - used only training data collected automatically using a query built manually from the given official query textual description.
 The training categories E and F support experiments using a no annotation condition.
The idea is to promote the development of methods that permit the indexing of concepts
in video shots using only data from the Web or archives without the need of additional annotations.
The training data could for instance consist of images or videos retrieved by a general purpose search engine
(e.g. Google) using only the query definition with only automatic processing of the returned results.
 
By "no annotation", we mean here that no annotation should be manually done on the retrieved samples (either images or videos).
Any annotation done by somebody else prior to the general search does not count. Methods developed in this context
could be used for building indexing tools for any concept starting only from a simple query defined for it.
 Please note these restrictions and information on training types.
Run submission types:
Three main submission types will be accepted:
- Fully automatic (F) runs (no human input in the loop): A System takes official query as input and generates results without any human intervention.
- Manually-assisted (M)  runs: where a human can formulate the initial query once based on topic and query interface, not on knowledge of collection or search results.
    Then the system takes the formulated query as input and generates results without further human intervention.
- Relevance-Feedback (R)  runs: System takes the official query as
    input and generates initial results, then a human judge can assess
    up to the top-30 results and input this
    information as a feedback to the system to generate a new set of results. This feedback loop is strictly permitted ONLY up to 3 iterations.
  - 
    An extra 1 run of a novelty type (N) is allowed to be submitted within the main task. The goal of this run is to encourage systems to submit
        novel and unique relevant shots not easily discovered by other runs. Each team may submit only 1 novelty run.
        Please note the required xml field in the dtd file indicating if the run is of novelty or common type.
  
- Optional attributes are available for teams to submit explainability results. This is supported in the xml run files
as optional timestamp and bounding box coordinates (please see the
XML DTD container for run files) to localize "why" the submitted shot is relevant to the query. The goal of this initiative is to allow
for more detailed diagnostic results for teams and NIST and to assess if submitted results are relevant for the correct reasons.
- Given the two main paradigms in video search (concept-based vs visual-semantic embedding), additional run type will be supported for submission
  to better learn the differences, strength and weaknesses of these two main approaches. Systems can submit an additional 1 run ONLY based on concept detection
  per submission type, thus maximum of 3 concept-based runs (if submitting using the 3 submission types F, M and R).
  Please consult the DTD file for labeling the concept-based run in your xml submission file
- The submission types (automatic, manually-assisted, relevance feedback) are orthogonal to the training types (A, D, E, F).
Each team may submit a maximum of 4 prioritized runs, per submission type, with 2
additional if they are of the "no annotation" training type (E or F) and the others are not. The submission formats are described below.
- Please note: Only submissions which are valid when checked against the supplied DTDs will be accepted. You must check and correct if needed your
submission before submitting it. NIST reserves the right to reject any submission which does not parse correctly against the provided
DTD(s). Various checkers exist, e.g., Xerces-J: java sax.Counter -v YourSubmision.xml.
- We highly encourage each team to submit 1 baseline run for their system approach so comparison and analysis with other advanced runs can be conducted.
  Please when submitting the baseline run, make sure to label the run file name with a clear label (eg. "baseline").
Run submission format:
-  Participants will submit results against V3C2 data in each run for all and only the 20 main queries and for each query at most 1000 shot IDs.
-  The DTD includes an xml field to determine the task type ("M") as well as the run types. Please note that in 2025 there is no progress runs (P) type.
-  Here for download (right click and choose "display page source" to see the entire files) is a
DTD file for Adhoc search results of one main run,
the container for one run, and a
small example of what a site would send to NIST for evaluation.
Please check all your submissions to see that they are well-formed.
- Please submit each of your runs in a separate file, named to make clear which team has produced it.  EACH file you submit
should begin, as in the example submission, with the DOCTYPE statement:
 
 that refers to the DTD at NIST via a URL and with a videoAdhocSearchResults element even though there is only one run included.
- Remember to use the correct shot IDs in your submissions. The shot IDs take the form of "shotXXXX_YY", where XXXX is the original video ID and YY is the segmented shot ID.
Please don't use any keyframe associated file names in your submissions. Consult the V3C2 readme file 
for more information about submitting the correct shot file names and the master shot reference for V3C2 dataset
- Please note the *optional* explainability attributes with each submitted shot to indicate a timestamp (in seconds) of a frame within the shot and a bounding box coordinates.
- All runs will be submitted via the evalbase platform system from your team account. Please submit each run in separate uncompressed xml file
- Please consult the general TRECVID schedule page for query release and run submission due dates
Evaluation:
- All 20 queries will be evaluated using the ground truth created in 2024 by assessors at NIST after pooling and sampling.
- Please note that NIST uses a number of rules in manual assessment of system output.
 Measures:
- Mean extended inferred average precision (mean xinfAP),
will be applied. This allows sampling density to vary e.g. so that it can be 100% in the top strata, which are most important for average precision.
- As in past years, other detailed measures based on recall, precision will be provided by the sample_eval software.
- Speed will also be measured: clock time per query search, reported in seconds (to one decimal place) must be provided in each run.
- A special metric will be applied to score Novelty runs such that more credit can be given to unique shots.
Open Issues:
     
    
      
      
News magazine, science news, news reports, documentaries, educational programming, and archival video
      
TV Episodes
      
Airport Security Cameras & Activity Detection
      
Video collections from News, Sound & Vision, Internet Archive, 
Social Media, BBC Eastenders