Restrictions on use of development and test data in the search and semantic indexing tasks

Each participating group is responsible for adhering to the letter and spirit of these rules, the intent of which is to make the TRECVID evaluation realistic, fair, and maximally informative about system effectiveness as opposed to other confounding effects on performance. Submissions, which in the judgment of the coordinators and NIST do not comply, will not be accepted.

Test data

The test data cannot be used for system development and system developers should have no knowledge of it until after they have submitted their results for evaluation to NIST. Depending on the size of the team and tasks undertaken, this may mean isolating certain team members from certain information or operations, freezing system development early, etc.

Participants may use donated semantic indexing output from the test collection but incorporation of such features should be automatic so that system development is not affected by knowledge of the extracted features. Anyone doing searches must be isolated from knowledge of that output.

Development data

The development data is intended for the participants' use in developing their systems. It is up to the participants how the development data is used, e.g., divided into training and validation data, etc.

Participants may use other development resources not excluded in these guidelines. Such resources must be reported at the workshop. Note that use of other resources will change the submission's status with respect to system development type, which is described next.

In order to help isolate system development as a factor in system performance each semantic indexing task submission, or donation of extracted features must declare its training type:

If the run is a SIN run of the "no annotation" sort then choose from the following 2 training types:
- E - used only training data collected automatically using only the concepts' name and definition
- F - used only training data collected automatically using a query built manually from the concepts' name and definition
  As the name "no annotation" indicates, for the categories E and F, no manual annotation should be done on the automatically collected data; automatic processing is allowed and encouraged but data should be processed blindly.
Otherwise choose from the following 4 training types:
Please note a change to a stricter interpretation of the following categories: all data used for training at any level of any system component should be considered. This means that even just the use of something like a face detector that was trained on non-IACC training data would disqualify the run as type "A". This implies that some systems accepted in category A in the previous years will be placed in categories B, C or D with the new and more strict rules.
While the categories will be taken more strictly than in the previous years, they will be used only as an information for clarifying what is done by the participants. They will not be used for presenting the results in different tables and figures; there will be only one global ranking and one global plot in which all systems will be gathered, they will be just tagged (as previously) by the category as part of the run named generated at NIST.
- A - used only IACC training data
- B - used only non-IACC training data
- C - used both IACC and non-IACC TRECVID (S&V and/or Broadcast news) training data
- D - used both IACC and non-IACC non-TRECVID training data

We encourage groups to submit at least one pair of runs from their allowable total that helps the community understand how well systems trained on non-IACC data generalize to IACC test data.