Guidelines for the TREC-2001 Video Track
Goal:
-
Promote progress in content-based retrieval from digital video via open,
metrics-based evaluation.
Data:
Tasks:
-
Identify the shot boundaries in the given video clip(s) (automatic)
-
Given a statement of information need, return a ranked list of shots
(number to be determined) which best satisfy the need. Interactive or
fully automatic approaches will be allowed. Some topics (known-item
searches) may be crafted so as to be satisfied only by a small number
of clips, known at topic creation. Please note the following restrictions
for this task:
- The only manually created information search systems are allowed
to use will be that which is available as part of the test collection,
namely: the existing transcripts associated with the NIST files and the
existing descriptions associated with the BBC material.
- Systems are allowed to use transcripts created by automatic speech
recognition (ASR). But any group which does this must submit a run
using ONLY the ASR transcript data - as a baseline.
Motivations for initial task choices:
-
Shot boundary detection
-
needed for higher-level tasks
-
easier entry due to existing base of example
work, software,...
-
Known-item(s) search
-
reflect a significant type of user need ("I know it/they are there - somewhere!")
-
lower evaluation costs; human assessors not needed since "answers" identified
by topic author
-
General statements of information need
-
most diverse, toughest for systems, costliest to evaluate, but ultimately
probably most important for real users
Example types of needs:
I'm interested in video material / information about:
-
a specific person
e.g., I want all the information you have on Ronald Reagan.
-
one or more instances of a category of people
e.g., Find me some footage of men wearing hardhats.
-
a specific thing
e.g., I'm interested in any material on Hoover Dam. I'm looking for
a picture of the OGO satellite.
-
one or more instances of a category of things
e.g., I need footage of helicopters.
-
a specific event/activity
e.g., I'm looking for a clip of Ronald Reagan reading a speech about
the space shuttle
-
one or more instances of a category of events/activities
e.g., I want to include several different clips of rockets taking off.
I need to explain what cavitation is all about.
-
other?
Topics:
-
Describe the information need - input to systems and guide to humans assessing
relevance of system output
-
To be developed in first year mainly by the participants (5 topics per
group minimum) and some by NIST
-
Multimedia - subject to the nature of the need and the questioner's choice
of expression
-
Mostly in the realm of the doable for current systems - as determined by
participants
-
As realistic in intent and expression as possible - we can imagine a trained
searcher trying to find material for reuse in a large video archive, asking
for this information or video material in this way
-
Suggested template for topic to be submitted:
-
Author/group
-
Text description of the information need
-
Examples* of what is wanted:
-
reference to video clip
-
reference to image
-
reference to audio
-
for interactive processing? (Y/N)
-
for fully automatic processing? (Y/N)
-
Satisfiably by only ? clip(s) in the collection? (Y/N)
* If possible, the examples should come from outside the test data. They
could be taken from NIST or OpenVideo material not part of the test collection
or from other public domain sources. If the example comes from the test
collection, the text description should be such that using a quotation
from the test collection is plausible, e.g., "I want to find all the OTHER
shots dealing with X." A search for a single shot cannot be described with
an example from the target shot.
- Offical topics:
Evaluation:
Each participating group is allowed to submit the results from up to
two system variants.
A submission checker program in Java
is provided to find some basic errors in submissions, but it is the
participating group's responsibility to submit well-formed data for
evaluation. There is no guarantee that ill-formed submissions will be
evaluated.
-
Shot boundary detection
-
Automatic comparison to human-annotated reference - details here
-
Known-item(s) search
-
Automatic comparison to reference derived from known items identified
in topics.
- Result set size: 100 shots maximum
- Per-search measure: average precision - the mean of the precision
obtained after each known item is retrieved, using zero as the precision
for known items not retrieved.
-
General statements of information need
-
Human assessment per shot of whether the shot meets the need or not
- Result set size: 20 shots maximum
- Per-search measures:
- precision
- recall (we will attempt this based on the union of submitted shots judged relevant, no promises)
The software used to evaluate the submissions
is available to researchers along with a high-level description of the
various matching and scoring algorithms involved. This software was
produced by NIST, an agency of the U.S. government, and by statute is
not subject to copyright in the United States. Recipients of this
software assume all responsibilities associated with its operation,
modification and maintenance.
Results for TREC-2001:
The evaluated submissions are available from the TREC archive. Tables of
the measures calculated by NIST are also
available.
Milestones for TREC-2001:
-
15. Jan
-
Revised proposal posted to trecvid discussion list for comment>
-
15. Feb
-
Groups intending to participate send short application
to NIST.
- 1. Mar
- Participating groups post to the trecvid list an estimate of how many
topics of what sorts they intend to contribute (e.g., known-item
topics that will need a human in the loop, general search requests for
interactive and fully automatic processing, known-item topics that can
be handled fully automatically, etc.
- 16. Apr
- Participating groups submit planned test topics to NIST.
NIST will pool them - interactive vs batch, known item(s) searches
versus general. Systems will be tested against the union of the appropriate
submitted topics. If enough are created, a random sample can be distributed
for use in system development.
- 1. May
- Remaining detail of guidelines complete, including schedule for distribution
of test topics and for evaluation.
- 18. May
- Topic definitions frozen
- 15. June
- Test set for shot boundary detection announced.
Submission checker program available
- 1. August
- Shot boundary detection submissions due at NIST for evaluation.
Here is a DTD for shot boundary
results on one file, one for
results on multiple files, and a small example of what a site would
send to NIST for evaluation. Please check your submission to see that it is
well-formed.
- 17. Aug
- General and known-item search submission due at NIST for
evaluation. Here is a DTD for search
results on one topic, one for
results on multiple topics, and a small example of what a site would
send to NIST for evaluation. Please check your submission to see that it is
well-formed.
- 1. Oct
- Results of shot boundary detection evaluation returned to participants
- 5. Oct
- Results of evaluations returned to participants
- 28. Oct
- Conference notebook papers due at NIST)
- 13.-16. Nov
- TREC 2001 Conference at NIST in Gaithersburg, Md.
- 4. Feb 2002
- Final proceedings papers due at NIST
Guideline issues still to be resolved:
Contacts:
-
Coordinator:
-
NIST contact:
Last updated: Friday, 01-Mar-2019 14:20:54 EST
Date created: Tuesday, 21-Nov-00
For further information contact Paul
Over (over@nist.gov)