System description Section 1 - System description The MediaMill system uses deep learning inspired features at the video-level in combination with a VideoStory embedding [1] and an SVM classifier to find positive videos. Recounting is done using deep learning features at the frame level. Section 2 - Metadata Generator Description The system computes the deep learning features on two frames per second. The features are averaged per video to obtain a video-level representation. Section 3 - Semantic Query Generator Description The semantic query generator takes the event description and turns it into a VideoStory representation. For recounting, the system uses co-occurrence statistics [2] estimated from ImageNet Fall 2011 to select 100 potentially interesting tags per event. Section 4 - Event Query Generator Description The event query generator uses the deep learning feature with a VideoStory embedding. A trained SVM classifier serves as event detector. For recounting, the system splits the exemplar videos into fragments and selects the top 50 per video based on their discriminative capabilities. Section 5 - Event Search Description The system ranks videos based on the SVM classification results using the deep learning features. For recounting, the fragments with the highest resemblance to the top 50 are considered as key evidence, and their corresponding tags are reported. Section 6 - Training data and knowledge sources The deep learning features are based on annotations from ImageNet. The VideoStory embedding is based on videos crawled from YouTube [1]. References [1] A. Habibian, T. Mensink, and C.G.M. Snoek. VideoStory: A New Multimedia Embedding for Few-Example Recognition and Translation of Events. In ACM Multimedia, 2014. [2] T. Mensink, E. Gavves, and C.G.M. Snoek. COSTA: Co-Occurrence Statistics for Zero-Shot Classification. In CVPR, 2014.