Each participating group is responsible for adhering to the letter and spirit of these rules, the intent of which is to make the TRECVID evaluation realistic, fair, and maximally informative about system effectiveness as opposed to other confounding effects on performance. Submissions, which in the judgment of the coordinators and NIST do not comply, will not be accepted.
The test data cannot be used for system development and system developers should have no knowledge of it until after they have submitted their results for evaluation to NIST. Depending on the size of the team and tasks undertaken, this may mean isolating certain team members from certain information or operations, freezing system development early, etc.
Participants may use donated feature extraction output from the test collection but incorporation of such features should be automatic so that system development is not affected by knowledge of the extracted features. Anyone doing searches must be isolated from knowledge of that output.
Participants cannot use the knowledge that the test collection comes from news video recorded during a known time period in the development of their systems. This would be unrealistic.
The development data is intended for the participants' use in developing their systems. It is up to the participants how the development data is used, e.g., divided into training and validation data, etc.
Other data sets created by LDC for earlier evaluations and derived from the same original videos as the test data cannot be used in developing systems for TRECVID 2008.
If participants use the output of an ASR/MT system, they must submit at least one run using the English ASR/MT provided by NIST. They are free to use the output of other ASR/MT systems in additional runs.
Participants may use other development resources not excluded in these guidelines. Such resources should be reported at the workshop. Note that use of other resources will change the submission's status with respect to system development type, which is described next.
Since by design we have multiple annotators for most of the common training data features but it is not at all clear how best to combine those sources of evidence, it seems advisable to allow groups using the common annotation to choose a subset and still qualify as using type A training. This may be equivalent to adding new negative judgments. However, no new positive judgments can be added.
There continues to be a special interest in how well systems trained on one sort of data generalize to another related, but different type of data with little or no new training data. The available training data contain some that is specific to the Sound and Vision video and some that is not. Therefore we are introducing three additional training categories:
We encourage groups to submit at least one pair of runs from their allowable total that helps the community understand how well systems trained on non-Sound-and-Vision data generalize to Sound-and-Vision data.