Guidelines for the TRECVID 2003 Evaluation
(last updated: Tuesday, 15-Jun-2004 13:43:10 UTC)
1. Goal:
The main goal of the TREC Video Retrieval Evaluation (TRECVID) is to
promote progress in content-based retrieval from digital video via
open, metrics-based evaluation.
2. Tasks:
TRECVID is a laboratory-style evaluation that attempts to model
real world situations or significant component tasks involved
in such situations.
There are four main tasks with tests associated and participants must
complete at least one of these in order to attend the workshop.
- shot boundary determination
- story segmentation
- high-level feature extraction
- search
2.1 Shot detection:
The task is as follows: identify the shot boundaries with their location and
type (cut or gradual) in the given video clip(s)
2.2 Story segmentation:
The task is as follows: given the story boundary test collection,
identify the story boundaries with their location (time) and type
(miscellaneous or news) in the given video clip(s). This is a new task
for 2003.
A story
can be composed of multiple shots, e.g. an anchorperson introduces
a reporter and the story is finished back in the studio-setting. On the other
hand, a single shot can contain story boundaries, e.g. an anchorperson
switching to the next news topic.
The task is based on manual story boundary annotations made by LDC
for the TDT-2 project. Therefore, LDC's definition of a story
will be used in the task: A news story is defined as a
segment of a news broadcast with a coherent news focus which
contains at least two independent, declarative clauses. Other
coherent segments are labeled as miscellaneous. These
non-news stories cover a mixture of footage: commercials,
lead-ins and reporter chit-chat. Guidelines that were used for
annotating the TDT-2 dataset are available at http://www.ldc.upenn.edu/Projects/TDT2/Guide/manual.front.html. Other useful documents are the guidelines document for the annotation of the TDT4 corpus and a similar document on TDT3, which discuss the annotation guidelines for the different corpora. Section 2 in the TDT4 document is of particular interest for the story segmentation task.
Note: adjacent non-news stories are merged together and
annotated as one single story classified as "miscellaneous".
Differences with the TDT-2 story segmentation task:
- TRECVID 2003 uses a subset of TDT2 dataset: only video sources.
- Video stream is available to enhance story segmentation.
- The task is modeled as a retrospective action, so it is allowed
to use global data.
- TRECVID 2003 has a story classification task (which is optional).
There are several required and recommended runs:
- Required: Video + Audio (no ASR/CC)
- Required: Video + Audio + ASR
- Required: ASR (no Video + Audio)
- The ASR in the required and recommended runs is the ASR provided
by LIMSI. We have dropped the use of the CC data on the hard drive
and adopted use the LIMSI ASR rather than that provided on the hard
drive because the LIMSI ASR is based on the MPEG-1 version of the
video and requires no alignment. Additional runs can use other
ASR systems.
- It is recommended that story segmentation runs are complemented
with story classification.
With TRECVID 2003's story segmentation task, we hope to show how
video information can enhance story segmentation algorithms.
2.3 Feature extraction:
Various high-level semantic features, concepts such as "Indoor/Outdoor",
"People", "Speech" etc., occur frequently in video databases. The proposed
task will contribute to work on a benchmark for evaluating the effectiveness of
detection methods for semantic concepts
The task is as follows: given the feature test collection, the common
shot boundary reference for the feature extraction test collection,
and the list of feature definitions(see below), participants will
return for each feature the list of at most 2000 shots from the test
collection, ranked according to the highest possibility of detecting
the presence of the feature. Each feature is assumed to be binary,
i.e., it is either present or absent in the given reference shot.
Participants are encouraged to make their feature detection submission
available to other participants for use in the search task. Donors
should provide the donated detection over the search test collection
in the
feature exchange format by the date indicated in the schedule
below..
Description of features to be detected:
These descriptions are
meant to be clear to humans, e.g., assessors/annotators creating truth
data and system developers attempting to automate feature detection.
They are not meant to indicate how automatic detection should be
achieved.
If the feature is true for some frame (sequence) within the shot, then
it is true for the shot; and vice versa. This is a simplifaction
adopted for the benefits it affords in pooling of results and
approximating the basis for calculating recall.
NOTE: In the following, "contains x" is short for "contains x to a
degree sufficient for x to be recognizable as x to a human" . This
means among other things that unless explicitly stated, partial
visibility or audibility may suffice.
-
Outdoors: segment contains a recognizably outdoor location, i.e., one outside of buildings. Should exclude all scenes that are indoors or are close-ups of objects (even if the objects are outdoors).
-
News subject face: segment contains the face of at least one human
news subject. The face must be of someone who is not an anchor person,
news reporter, correspondent, commentator, news analyst, nor other
sort of news person.
-
People: segment contains at least THREE humans.
-
Building: segment contains a building. Buildings are walled
structures with a roof.
-
Road: segment contains part of a road - any size, paved or not.
-
Vegetation: segment contains living vegetation in its natural environment
-
Animal: segment contains an animal other than a human
-
Female speech: segment contains a female human voice uttering words
during and the speaker is visible.
-
Car/truck/bus: segment contains at least one automobile, truck, or
bus exterior.
-
Aircraft: segment contains at least one aircraft of any sort.
-
News subject monologue: segment contains an event in which a
single person, a news subject not a news person, speaks for a long
time without interruption by another speaker. Pauses are ok if short.
-
Non-studio setting: segment is not set in a tv broadcast studio
-
Sporting event: segment contains video of one or more organized
sporting events
-
Weather news: segment reports on the weather
-
Zoom in: camera zooms in during the segment
-
Physical violence: segment contains violent interaction between
people and/or objects
-
Person x: segment contains video of person x (x = Madeleine Albright)
2.4 Search:
The task is as follows: given the search test collection, a multimedia
statement of information need (topic), and the common shot boundary
reference for the search test collection, return a ranked list of at
most 1000 common reference shots from the test collection, which best
satisfy the need. Please note the following restrictions for this
task:
-
TRECVID 2003 will set aside the challenging problem of
fully automatic topic analysis and query generation. Submissions will
be restricted to those with a human in the loop, i.e., manual or
interactive runs as defined below.
-
Because the choice of features and their combination for search is an
open research question, no attempt will be made to
restrict groups with respect to their use of features in
search. However, groups making manual runs should report their queries,
query features, and feature definitions.
- Every submitted run must contain a result set for each topic.
-
One baseline run will be required of every manual system:
- A run based only on the text from the LIMSI ASR output and on the text of the topics.
-
In order to maximize comparability within and across participating
groups, all manual runs within any given site must be carried out by the
same person.
-
An interactive run will contain one result for each and every topic, each
such result using the same system variant. Each result for a topic can come
from only one searcher, but the same searcher does not need to be used
for all topics in a run. Here are some suggestions for interactive experiments.
-
The searcher should have no experience of the topics beyond the general world knowledge of an educated adult.
-
The search system cannot be trained, pre-configured, or otherwise tuned to the topics.
- The maximum total elapsed time limit for each topic (from the time
the searcher sees the topic until the time the final result set for
that topic is returned) in an interactive search run will be 15
minutes. For manual runs the manual effort (topic to query
translation) for any given topic will be limited to 15 minutes.
-
All groups submitting search runs must include the actual elapsed
time spent as defined in the videoSearchRunResult.dtd.
- Groups carrying out interactive runs are encouraged to measure user
characteristics and satisfaction as well and report this with their
results, but they need not submit this information to NIST. Here are
some examples of instruments for collection of user
characteristics/satisfaction data developed and used by the TREC
Interactive Track for several years.
-
In general, groups are reminded to use good experimental design
principles. These include among other things, randomizing the order
in which topics are searched for each run so as to balance learning
effects.
3. Video data:
NOTE: TRECVID 2003 is now over. Unless indicated, the 2003 test and
development data is fully available only to TRECVID participants.
This includes the basic MPEG-1 files, and derived files such as ASR,
story segmentation, and transcript files.
LDC may make some of the data
generally available.
As you will see below, other data such as topics, feature donations,
the results
of the collaborative annotation, and various sorts of truth data
created by/for NIST is freely available from this page.
Sources
The total identified collection comprises
-
~120 hours (241 30-minute programs) of ABC World News Tonight and CNN
Headline News recorded by the Linguistic Data Consortium from late
January through June 1998
-
~13 hours of C-SPAN programming (~ 30 mostly 10- or 20-minute
programs) about two thirds 2001, others from 1999, one or two from
1998 and 2000. The C-SPAN programming includes various government
committee meetings, discussions of public affairs, some lectures,
news conferences, forums of various sorts, public hearings, etc.
Associated textual data (with associated file extensions)
Provided with the ABC/CNN MPEG-1 data (*.mpg) will be the output of an
automatic speech recognition system (*.as1) and a
closed-captions-based transcript. The transcript will be available in
two forms: simple tokens (*.tkn) with no other information for the
development and test data; tokens grouped into stories (*.src_sgm)
with story start times and type for the development collection. The
times in the ASR and transcript data are based on the analogue
version of the video and so are offset from the current MPEG-1
digital version.
LDC has provided alignment tables so that the old times can be used
with the new video. Here is a table for the
development collection based on LDC's manual examination of three
points in each file.
Additional ASR output from LIMSI-CNRS:
Jean-Luc Gauvain of the Spoken Language
Processing Group at LIMSI has graciously donated ASR output for the
entire collection Be sure to credit them
for this contribution by a non-participant.
J.L. Gauvain, L. Lamel, and G. Adda.
The LIMSI Broadcast News Transcription System.
Speech Communication, 37(1-2):89-108, 2002.
ftp://tlp.limsi.fr/public/spcH4_limsi.ps.Z
Development versus test data
About 6 hours of data were selected from the total collection to be
used solely as the shot boundary test collection.
The remainder was sorted more or less chronologically (C-SPAN covers a
slighly different period than the ABC/CNN data). The first half was
designated the feature / search / story segmentation development
collection. The second is the feature / search / story segmentation
test collection. Note that the story segmentation task will not use
the C-SPAN files for development or test.
All of the development and test data with the exception of the shot
boundary test data will be shipped by the Linguistic Data
Consortium (LDC) on an IDE hard disk to each participating site at no
cost to the participants. Each such site will need to offload the data
onto local storage and pay to return the disk to LDC. The size of data
on the hardrive will be a little over 100 gigbytes. The shot boundary
test data (~ 5 gigabytes) will be shipped by NIST to participants on
DVDs (DVD+R).
Restrictions on use of development and test data
Each participating group is responsible for adhering to the letter and
spirit of these rules, the intent of which is to make the TRECVID
evaluation realsitic, fair and maximally informative about system
effectiveness as opposed to other confounding effects on
performance. Submissions, which in the judgment of the coordinators
and NIST do not comply, will not be accepted.
Test data
The test data shipped by LDC cannot be used for system development and
system developers should have no knowledge of it until after they have
submitted their results for evaluation to NIST. Depending on the size
of the team and tasks undertaken, this may mean isolating certain team
members from certain information or operations, freezing system
development early, etc.
Participants may use donated feature extraction output from the
test collection but incorporation of such features should be automatic
so that system development is not affected by knowledge of the
extracted features. Anyone doing searches must be isolated from
knowledge of that output.
Participants cannot use the knowledge that the test collection comes
from news video recorded during the first half of 1998 in the
development of their systems. This would be unrealistic.
Development data
The development data shipped by LDC is intended for the participants'
use in developing their systems. It is up to the participants how the
development data is used, e.g., divided into training and validation
data, etc.
Other data sets created by LDC for earlier evaluations and derived
from the same original videos as the test data cannot be used in
developing systems for TRECVID 2003.
If participants use the output of an ASR system, they must submit at least one run using that provided on the loaner drive from LDC. They are free to use the output of other ASR systems in additional runs.
If participants use a closed-captions-based transcript, they must use
only that provided on the loaner drive from LDC.
Participants may use other development resources not excluded in these
guidelines. Such resources should be reported at the workshop. Note
that use of other resources will change the submission's status with
respect to system development type, which is described next.
There is a group of participants creating and sharing annotation of
the development data. See the Video
Collaborative Annotation Forum webpage for details. Here is the
set of collaborative annotations created for TRECVID 2003.
In order to help isolate system development as a factor in system performance
each feature extraction task submission, search task submission,
or donation of extracted features must declare its type:
- A - system trained only on common development collection and the common annotation of it
- B - system trained only on common development collection but not on (just) common annotation of it
- C - system is not of type A or B
3.1 Common shot boundary reference and keyframes:
A common shot boundary reference has again kindly been provided by
Georges Quenot at CLIPS-IMAG. Keyframes have also been selected for
use in the search and feature extraction tasks. NIST can provide the
keyframes on DVD+R with some delay to participating groups unable to
extract the keyframes themselves.
The emphasis in the common shot
boundary reference will be on the shots, not the transitions.
The shots are contiguous. There are no gaps between them. They
do not overlap. The media time format is based on the Gregorian day
time (ISO 8601) norm. Fractions are defined by counting pre-specified
fractions of a second. In our case, the frame rate will likely be
29.97. One fraction of a second is thus specified as
"PT1001N30000F".
The video id has the format of "XXX" and shot id "shotXXX_YYY". The
"XXX" is the sequence number of video onto which the video file name
is mapped, this will be listed in the "collection.xml" file. The "YYY"
is the sequence number of the shot. Keyframes are identified as by a
suffix "_RKF" for the main keyframe (one per shot) or "_NKRF" for
additional keyframes derived from subshots that were merged so that
shots have a minimum duration of 2 seconcds.
The common shot boundary directory contains these file(type)s:
- shots2003 - a directory with one file of shot information for each
video file in the development/test collection
- xxx.mp7.xml - master shot list for video with id "xxx" in
collection.xml
- collection.xml - a list of the files in the collection
- README - info on the segmentation
- time.elements - info on the meaning/format of the MPEG-7 MediaTimePoint and MediaDuration elements
- shots2003.tar.gz - gzipped tar file of the shots2003 directory
4. Information needs and topics:
4.1 Example types of informations needs
I'm interested in video material / information about:
-
a specific person
-
one or more instances of a category of people
-
a specific thing
-
one or more instances of a category of things
-
a specific event/activity
-
one or more instances of a category of events/activities
-
a specific location
-
one or more instances of a category of locations
-
combinations of the above
As an experiment, NIST may create a topic of the form "I'm looking
for video that tells me the name of the person/place/thing/event in
the image/video example"
Topics may target commercials as well as news content.
4.2 Topics:
The topics, formatted multimedia statements of information need, will
be developed by NIST who will control their distribution. The topics
will express the need for video concerning people, things, events,
locations, etc. and combinations of the former. Candidate topics (text
only) will be created at NIST by mining various news sources from the
time period of the test collection and a log of actual queries logged
and provided by the BBC. The test collection will then be examined to
see how frequent relevant shots occur for each topic. Each topic will
then either be accepted or rejected. Accepted topics will be enhanced
with non-textual examples from the Web if possible and from the
development data if need be. Current plans are to use an InforMedia*
client as part of the testing to see that some relevant shots occur in
the test collection. The goal is to create 25 topics.
* Note: The identification of any commercial product or trade name
does not imply endorsement or recommendation by the National Institute
of Standards and Technology
- Topics describe the information
need. They are input to systems and guide to humans assessing
relevance of system output
-
Topics are multimedia objects - subject to the nature of the need
and the questioner's choice of expression
-
As realistic in intent and expression as possible
-
Template for topic:
-
Title
-
Brief textual description of the information need (this text may contain
references to the examples)
-
Examples* of what is wanted:
-
reference to video clip
- Optional brief textual clarification of the example's relation to the need
-
reference to image
- Optional brief textual clarification of the example's relation to the need
-
reference to audio
- Optional brief textual clarification of the example's relation to the need
* If possible, the examples will come from outside the test
data. They could be taken from various stable public domain
sources. If the example comes from the test collection, the text
description will be such that using a quotation from the test
collection is plausible, e.g., "I want to find all the OTHER shots
dealing with X." A search for a single shot cannot be described with
an example from the target shot.
5. Submissions and Evaluation:
Please note: Only submissions which are valid when checked against
the supplied DTDs will be accepted. You must check your
submission. Various checkers exist, e.g., the one at Brown
University, Xerces-J, etc.
The results of the evaluation will be made available to attendees at
the TRECVID 2003 workshop and will be published in the final
proceedings and/or on the TRECVID website within six months after the
workshop. All submissions will likewise be available to interested
researchers via the TRECVID website within six months of the workshop.
5.1 Shot boundary detection
-
Participating groups may submit up to 10 runs. All runs will be
evaluated.
-
The format of submissions will be the same as in 2002. Here is a
DTD for shot boundary
results on one video file, one for results on multiple files,
and a small example of
what a site would send to NIST for evaluation. Please check your
submission to see that it is well-formed
-
Please send your submissions (up to 10 runs) in an email to
over@nist.gov. Indicate somewhere (e.g., in the subject
line) which group you are attached to so that we match you up with the
active participant's database.
-
Automatic comparison to human-annotated reference
- Measures:
- All transitions:
for each file, precision and recall for
detection; for each run, the mean precision and recall per reference transition
across all files
-
Gradual transitions only: "frame-recall" and "frame precision" will be
calculated for each detected gradual reference transition. Averages
per detected gradual reference transition will be calculated for each
file and for each submitted run. Details
are available.
5.2 Feature extraction
Submissions
-
Participating groups may submit up to 10 runs. All runs will be
evaluated.
- For each feature in a run, participants will return at most
2000.
-
Here is a DTD for
feature extraction results of one run, one for results from multiple
runs, and a
small example of what a site would send to NIST for
evaluation. Please check your submission to see that it is well-formed
-
Please send your submission in an email to Cedric.Coulon@nist.gov.
Indicate somewhere (e.g., in the subject line)
which group you are attached to so that we match you up with the
active participant's database. Send all of your runs as one file or
send each run as a file but please do not break up your submission any
more than that. A run will contain results for all features you worked
on.
Evaluation
-
The unit of testing and performance assessment will be the video shot
as defined by the track's common shot boundary reference. The
submitted ranked shot lists for the detection of each feature will be
judged manually as follows. We will take all shots down to some fixed
depth (in ranked order) from the submissions for a given feature -
using some fixed number of runs from each group in priority sequence
up to the median of the number of runs submitted by any group. We will
then merge the resulting lists and create a list of unique
shots. These will be judged manually down to some depth to be
determined by NIST based on available assessor time and number of
corrent shots found. NIST will maximize the number of shots judged
within practical limits. We will then evaluate each submission to its
full depth based on the results of assessing the merged subsets. This
process will be repeated for each feature.
-
If the feature is perceivable by the
assessor for some frame (sequence) however short or long then, then
we'll assess it as true; otherwise false. We'll rely on the complex
thresholds built into the human perceptual systems. Search and feature
extraction applications are likely - ultimately - to face the complex
judgment of a human with whatever variability is inherent in
that.
-
Runs will be
compared using precision and recall. Precision-recall curves will be
used as well as a measure which combines precision and recall:
(mean) average precision(see below under Search for details).
5.3 Story segmentation
Submissions
-
Participating groups may submit up to 10 runs. All runs will be
evaluated.
- The task is defined on the search dataset, which is partitioned
in a development and test collection (cf. Section 3).
-
The reference data is defined such that there are no gaps between
stories and stories do not overlap.
-
The evaluation of the story segmentation task will be defined on the video
segment defined by its clipping points (the overlap between the mpeg
file and the ground truth data).
A table of clipping points
is available.
-
For the segmentation task, a boundary <= the first clipping point will
be ignored (truth and submission); a boundary >= the last clipping
point will be ignored (truth and submission).
-
For the classification task - only and ALL of the time interval
between the two clipping points for a file will be considered in
scoring even parts of stories split by a clipping point.
-
Here is a
DTD for a story segmentation/classification submission from a
group, a partial example of a
segmentation-only submission, and another of a
classification submission. Please check your submission to see that
it is well-formed. Stories within a run result must be in chronological
sequence with the earliest at the beginning of the file. Submissions should
include boundaries for all the videos in the test set.
-
Please send your submissions (up to 10 runs) in an email to
Cedric.Coulon@nist.gov. Indicate somewhere (e.g., in the subject
line) which group you are attached to so that we match you up with the
active participant's database.
Evaluation
-
Since story boundaries are rather abrupt changes of focus, story
boundary evaluation is modeled on the evaluation of shot boundaries
(the cuts, not the gradual boundaries). A story boundary is expressed
as a time offset with respect to the start of the video file in
seconds, accurate to nearest hundredth of a second. Each reference boundary is expanded with a
fuzziness factor of five seconds in each direction, resulting in an
evaluation interval of 10 seconds.
- A reference boundary is detected when one or more
computed story boundaries lies within its evaluation interval.
- If a computed boundary does not fall in the evaluation interval
of a reference boundary, it is considered a false alarm.
-
Story boundary recall= number of reference boundaries detected/ total number of
reference boundaries
-
Story boundary precision= (total number of submitted boundaries minus
the total amount of false alarms)/ total number
of submitted boundaries
- The evaluation of story classification is defined as follows: for each reference news segment, we check in the submission file how
many seconds of this timespan are marked as news. This yields the total amount of
correctly identified news subsegments in seconds.
- News segment precision
= total time of correctly identified news subsegments/ total time of news segments in submission
- News segment recall
= total time of correctly identified news subsegments / total time of reference news segments
Comparability with TDT-2 Results
Results of the TRECVID 2003 story segmentation task cannot be directly
compared to TDT-2 results because the evaluation datasets differ and
different evaluation measures are used. TRECVID 2003 participants have
shown a preference for a precision/recall oriented evaluation, whereas
TDT used (and is still using) normalized detection cost. Finally, TDT
was modeled as an on-line task, whereas TRECVID examines story
segmentation in an archival setting, permitting the use of global
information. However, the TRECVID 2003 story segmentation task
provides an interesting testbed for cross-resource experiments. In
principle, a TDT system could be used to produce an ASR+CC or
ASR+CC+Audio run.
5.4 Search
Submissions
-
Participating groups may submit up to 10 prioritized runs. All runs
will be evaluated.
Each group doing interactive search should submit exactly one run for each
system variant. (System variants might differ in their user interfaces,
back-end funtionality, indexing, etc.) Each such run will contain
one result for each and every topic using the system variant for that
run. Each result for a topic can come from only one searcher, but the
same searcher does not need to be used for all topics in a run.
If a site has more than one searcher's result for a given topic and
system variant, it will be up to the site to determine which searcher's
result is included in the submitted result. NIST will try to make provision
for the evaluation of supplemental results, i.e., ones NOT chosen
for the submission described above. Details on this will be available
by the time the topics are released.
-
For each topic in a run, participants will return the list of at most
1000 shots.
Here is a DTD for
search results of one run, one for results from multiple
runs, and a
small example of what a site would send to NIST for
evaluation. Please check your submission to see that it is well-formed
-
Please send your submission in an email to ccoulon@nist.gov. Indicate
somewhere (e.g., in the subject line) which group you are attached to
so that we match you up with the active participant's database. Send
all of your runs as one file or send each run as a file but please do
not break up your submission any more than that. Remember, a run will
contain results for all of the topics.
Evaluation
-
The unit of testing and performance assessment will be the video shot
as defined by the track's common shot boundary reference. The subitted
ranked lists of shots found relevant to a given topic will be judged
manually as follows. We will take all shots down to some fixed depth
(in ranked order) from the submissions for a given topic - using some
fixed number of runs from each group in priority sequence up to the
median of the number of runs submitted by any group. We will then
merge the resulting lists and create a list of unique shots. These
will be judged manually to some depth to be determined by NIST based
on available assessor time and number of correct shots found. NIST
will maximize the number of shots judged within practical limits. We
will then evaluate each submission to its full depth based on the
results of assessing the merged subsets. This process will be repeated
for each topic.
- Per-search measures:
- average precision (definition below)
- elapsed time (for all runs)
- Per-run measure:
-
mean average precision (MAP):
Non-interpolated average precision, corresponds to the area
under an ideal (non-interpolated) recall/precision curve. To compute
this average, precision average for each topic is first calculated.
This is done by computing the precision after every retrieved relevant
shot and then averaging these precisions over the total number of
retrieved relevant/correct shots in the collection for that
topic/feature or the maximum allowed result set (whichever is
smaller). Average precision favors highly ranked relevant
documents. It allows comparison of different size result
sets. Submitting the maximum number of items per result set can never
lower the average precision for that submission. The topic averages
are combined (averaged) across all topics in the appropriate set
to create the non-interpolated mean average precision (MAP) for
that set. (See the TREC-10
Proceedings appendix on common evaluation measures for more
information.)
6. Milestones:
- 10 Jun
- Video data express-shipped to participants by LDC
- 11 Jun
- Guidelines complete
- 16? Jun
- Common shot reference and key frames available
- 30 Jun
- Common annotation complete
- 18 Jul
- Shot boundary test collection DVDs shipped by NIST
- 15 Aug
- Search topics
available from TRECVID website to active participants.
The needed clips
from the MPEG-1 videos used as examples in the topics (to save
downloading the entire example videos) are available from the team at
Dublin City University. The file is under 17 Megabytes as opposed to
about 1 Gigabyte for the complete files in which the clips are
contained. In some cases the files may contain a frame or two more
than specified in the topic description, but participants can further
trim the clips if necessary.
- 17 Aug
- Shot boundary detection submissions due at NIST for evaluation.
- 22 Aug
- Feature extraction task submissions due at NIST for evaluation.
Feature extraction donations due at NIST
- 25 Aug
- Feature extraction donations available for active participants
- 7 Sep
- Story segmentation/typing submissions due at NIST for evaluation
- 12 Sep
- Results of shot boundary evaluations returned to participants
- 24 Sep
- Search task submissions due at NIST for evaluation
- 24 Sep - 15 Oct
- Search and feature assessment at NIST
- 10 Oct
- Results of story segmentation evaluations returned to participants
- 17 Oct
- Results of search and feature extraction evaluations returned to participants
- 24 Oct
- Speaker proposals due at NIST (Instructions)
- 2 Nov
- Notebook papers due at NIST (Instructions)
- 3 Nov
- Workshop registration closes
- 17-18 Nov 2003
- TRECVID Workshop at NIST in Gaithersburg, Md.
- 15 Dec
- Workshop papers and slides publicly available
- c. 12 Jan 2004
- Call for participation in TRECVID 2004 sent out
- c. 16 Feb 2004
- Applications for participation in TRECVID 2004 due at NIST
- 1 Mar 2004
- Final versions of TRECVID 2003 papers due at NIST
8. Results, submissions, and evaluated runs for active participants
Submissions and evaluated submissions can be found in the "Past
results" section of the TREC website. Access requires that "fair use
guidelines" forms be filled out. Instructions are on the Past Results webpage
Other products of the evaluation generally available can be found
using the following links:
9 Contacts:
-
Coordinators:
-
NIST contact:
-
Email lists:
- Information and discussion for active workshop participants
-
trecvid2003@nist.gov
-
archive open to active participants only
-
NIST will subscribe the contact listed in your application to
participate when we have received it. Additional members of active
participant teams will be subscribed by NIST if they send email to
indicating they want to be subscribed, the
email address to use, their name, and providing the TRECVID 2003
active participant's password. Groups may combine the information
for multiple team members in one email.
Once subscribed, you can post to this list by sending you thoughts as
email to trecvid2003@nist.gov, where they will be sent out to everyone
subscribed to the list, i.e., the other active participants.
- General (annual) announcements about TRECVID (no discussion)
-
trecvid@nist.gov
-
open
archive
-
If you would like to subscribe, logon using the logon to which you
would like trecvid email to be reflected. Send email to
lori.buckland@nsit.gov and ask her to subscribe you to trecvid. This
list is used to notify interested parties about the call for
participation and broader issues. Postings will be infrequent.
Last
updated: Tuesday, 15-Jun-2004 13:43:10 UTC
Date created:
Monday, 19-Nov-01
For further information contact