Section 1 System Description: Our system is composed of three modules:1) the Metadata Generator model, 2)the Event Query Generator model, and 3)the Event Search model. The metadata language is feature vector code which describes the video to our system. Section 2 Metadata Generator Description: The Metadata Generator model extracts metadata from videos in the input video set. Audio feature and visual feature are extracted in this model. Salient trajectory is utilized to track the motion information. Several features (such as HOG, HOF and MBH) are computed along the trajectories. Mel-frequency cepstral coefficients (MFCC) is used to describe the audio feature. To encode features, we employ the powerful Fisher Vector model in our system, and compute one Fisher vector over the complete video. Section 3 Semantic Query Generator Description: We didn't use semantics in our queries. Section 4 Event Query Generator Description: In this model, we use early fusion strategy to combine trajectory-based feature, then linear support vector machine (SVM) is utilized to create event detector for each event. Section 5 Event Search Description: This model performs the search over the metadata store and produces search results for all videos. Late fusion strategy is utilized to combine classifier scores computed for audio and trajectories feature. Section 6 Training data and knowledge sources: We didn¡¯t use external data in our system.