Restrictions on use of development and test data

Each participating group is responsible for adhering to the letter and spirit of these rules, the intent of which is to make the TRECVID evaluation realistic, fair, and maximally informative about system effectiveness as opposed to other confounding effects on performance. Submissions, which in the judgment of the coordinators and NIST do not comply, will not be accepted.

Test data

The test data cannot be used for system development and system developers should have no knowledge of it until after they have submitted their results for evaluation to NIST. Depending on the size of the team and tasks undertaken, this may mean isolating certain team members from certain information or operations, freezing system development early, etc.

Participants may use donated feature extraction output from the test collection but incorporation of such features should be automatic so that system development is not affected by knowledge of the extracted features. Anyone doing searches must be isolated from knowledge of that output.

Participants cannot use the knowledge that the test collection comes from news video recorded during a known time period in the development of their systems. This would be unrealistic.

Development data

The development data is intended for the participants' use in developing their systems. It is up to the participants how the development data is used, e.g., divided into training and validation data, etc.

Other data sets created by LDC for earlier evaluations and derived from the same original videos as the test data cannot be used in developing systems for TRECVID 2008.

If participants use the output of an ASR/MT system, they must submit at least one run using the English ASR/MT provided by NIST. They are free to use the output of other ASR/MT systems in additional runs.

Participants may use other development resources not excluded in these guidelines. Such resources should be reported at the workshop. Note that use of other resources will change the submission's status with respect to system development type, which is described next.

In order to help isolate system development as a factor in system performance each feature extraction task submission, search task submission, or donation of extracted features must declare its type:

A - system trained only on common TRECVID development collection data: annotations and truth data publicly available to all participants. Such data include the TRECVID common annotations from 2003 and 2005, the LSCOM-lite and full LSCOM annotations, results of NIST judging in earlier TRECVIDs, the MediaMill baseline from 2006, any training data created for 2007 and shared with all participants - that includes the results of the active learning annotation and those provided by the MCG-ICT-CAS team for 2007.
Since by design we have multiple annotators for most of the common training data features but it is not at all clear how best to combine those sources of evidence, it seems advisable to allow groups using the common annotation to choose a subset and still qualify as using type A training. This may be equivalent to adding new negative judgments. However, no new positive judgments can be added.
B - system trained only on common development collection but not on (just) common annotation of it
C - system is not of type A or B

There continues to be a special interest in how well systems trained on one sort of data generalize to another related, but different type of data with little or no new training data. The available training data contain some that is specific to the Sound and Vision video and some that is not. Therefore we are introducing three additional training categories:

a - same as A but no training data (shared or private) specific to any Sound and Vision data has been used in the construction or running of the system.
b - same as B but no training data (shared or private) specific to any Sound and Vision data has been used in the construction or running of the system.
c - same as C but no training data (shared or private) specific to any Sound and Vision data has been used in the construction or running of the system.

We encourage groups to submit at least one pair of runs from their allowable total that helps the community understand how well systems trained on non-Sound-and-Vision data generalize to Sound-and-Vision data.