Section 1 System Description The metadata vectors of the event videos are used to train linear SVM classifiers. The final video score is a linear combination of the classifier scores. Section 2 Metadata Generator Description We used the following descriptors for each video: - dense trajectory descriptors (MBH + HOG + HOF) - SIFT and Color descriptors - CNN level 6 features - MFCC and Scatter descriptors for audio - speech recognition (this is the only difference with the INRIA-LIM-VocR submission) - optical character recognition - two attribute descriptors: CNN output for imagenet, classifier output for HMDB51 Local descriptors were aggregated to global descriptors using Fisher vectors. Section 3 Semantic Query Generator Description The event descriptions were manually converted to: - classes picked from the HMDB and Imagenet datasets - keywords that can be matched to ASR and OCR output Section 4 Event Query Generator Description During EQG, linear SVMs are trained for each channel, and their weights in the linear combination are trained using cross-validation. Section 5 Event Search Description: The ES just computes the scalar product of the SVM's W vectors with the image/video descriptors. Section 6 Training data and knowledge sources We used the HMDB and Imagenet datasets as external training data, and wordnet as a source for the vocabulary.