SECTION 1. System Description.

The system is based exclusively on comparing visual low-level features between videos. Different low-level local descriptors are extracted from videos. The event detection consists in performing approximate searches between test video descriptors (MED14Sub) and sample videos descriptors (10Ex and 100Ex). The classification is performed by a voting algorithm which counts for each event the retrieved k-NN descriptors that belong to each training video.
The system does not use any semantic concepts or external training data, therefore there is not any mapping between concepts and visual descriptors.

SECTION 2. Metadata Generator Description.

The low-level features extracted from training videos and evaluation videos varied between submissions.

PS event detection 10Ex, one modality is computed:
-training data: 20 frames/video, frame size 450x250, CSIFT descriptors (192-d), Hessian-Laplace point detector.
-eval data: 5 frames/video, frame size 400x225, CSIFT descriptors (192-d), Hessian-Laplace point detector.

PS event detection 100Ex, one modality is computed:
-training data: 5 frames/video, frame size 200x150, CSIFT descriptors (192-d), Hessian-Laplace point detector.
-eval data: 5 frames/video, frame size 200x150, CSIFT descriptors (192-d), Hessian-Laplace point detector.

Adhoc event detection 10Ex and 100Ex, for both training data and eval data, three modalities are computed:
-5 frames/video, frame size 400x225, CSIFT descriptors (192-d), Hessian-Laplace point detector.
-10 frames/video, frame size 400x225, CSIFT descriptors (192-d), MSER point detector.
-10 frames/video, frame size 450x250, SIFT descriptors (128-d), DoG point detector.

SIFT implementation: http://www.vlfeat.org/
CSIFT implementation: http://kahlan.eps.surrey.ac.uk/featurespace/web/


SECTION 3. Semantic Query Generator Description.
The system does not use semantics, therefore the Semantic Query Generator just returned the event id.


SECTION 4. Event Query Generator Description.
The system does not use semantics, therefore the Event Query Generator just returned the event id.


SECTION 5. Event Search Description.
For each modality (1 in PS and 3 in Adhoc):
-the descriptors for training videos (either 10Ex or 100Ex) and background videos are loaded and indexed using multiple KD-Trees (5).
-For each video in Med14Sub:
--the descriptors are loaded and approximate K-NN search is performed (K varies between 5 and 20 depending on modality).
--The number of leaves visited during the search is limited in order to spend T seconds per video (T varies between 0.5 a 2 seconds depending on modality).
--Each retrieved K-NN sums one vote to the event it belongs to (or 0 if it belongs to a background video). The following spatial restriction is applied in order to reduce noise: there must be at least 5 votes to the same frame in order to count the votes.
The score achieved by each Event corresponds to the sum of votes achieved by each modality.
Scores are normalized to max 1.
Threshold is fixed by the score achieved by the top-5 answer.


SECTION 6. Training data and knowledge sources.
The system does not use any external information nor metadata language.