Section 1. System Description A video is represented using low-level visual information (static and motion descriptors) and a sequence of model vectors, as explained in the following. For capturing motion information, improved dense trajectories and Fisher Vector (FV) encoding is used to describe the entire video with a high-dimensional motion feature vector. For extracting static visual descriptors and model vectors, each video is decoded and a set of key-frames are extracted at fixed temporal intervals (one key-frame every 6 seconds). Then SIFT, opponentSIFT, rgbSIFT and rgbSURF are extracted from each key-frame, using dense sampling, which if followed by VLAD encoding (separately for each of the four descriptors mentioned above). Each key-frame is also represented with a set of model vectors (i.e., a vector of responses of concept detectors); these detectors are build using the aforementioned VLAD-encoded static features and linear SVMs, and detectors for 346 concepts are used (i.e., the TRECVID SIN 2014 dataset concepts; the detectors are those trained for the SIN 2014 task). Subsequently, the VLAD-encoded static features and the model vectors for all key-frames of a video are averaged, to get a set of global video descriptor vectors / model vectors. All the above video feature vectors for a video are then concatenated to a single high-dimensional feature vector. These feature vectors are then used to build one detector for each event. This detector is built by combining a new very fast nonlinear discriminant analysis (DA) technique, for dimensionality reduction, and a Linear SVM, for final classification. For the MER task the derived model vectors at key-frame level are employed along with a variant of the above DA method in order to identify the most characteristic concepts for the specified event and the given video, also allowing for temporal localization (by looking at the corresponding model vectors at key-frame level, before their temporal averaging to a single model vector per video). Section 2. Metadata Generator Description Motion and static visual features are exploited as described in the following: - Improved dense trajectories (DT) are employed providing the following low-level features: Histogram of Oriented Gradients (HOG), Histogram of Optical Flow (HOF) and Motion Boundary Histograms (MBH). Hellinger kernel normalization is applied to the resulting feature vectors followed by Fisher Vector (FV) encoding with 256 GMM codewords. Subsequently, the three feature vectors are concatenated to yield the final motion feature descriptor for each video. - Each video is decoded into a set of key-frames in fixed temporal intervals. Four different local descriptors (SIFT, opponentSIFT, rgbSIFT, rgbSURF) with dense sampling are applied to extract local visual information for every key-frame. The extracted low level features are aggregated into a global image representation using VLAD encoding. - Each key-frame is also represented with a set of model vectors, one for each feature extraction procedure using 346 pre-trained concept detectors. The concepts used are the TRECVID SIN 2014 dataset concepts. The model vectors referring to the same key-frame are aggregated using the arithmetic mean operator, and subsequently, the model vectors of a video are averaged to represent the video. Section 3. Semantic Query Generator Description Our Semantic Queries are produced manually by visual inspection of the event description kit and our concept detectors. Thus, a set of selected concepts are used as semantic query for every event. Section 4. Event Query Generator Description We use nonlinear discriminant analysis (DA) to derive a lower dimensional embedding of the original data, and then employ fast Linear SVMs in the resulting subspace to learn the events. Particularly, for dimensionality reduction we utilize a new very fast kernel subclass-based method, which is shown to outperform other DA approaches. Section 5. Event Search Description We apply each one of the trained event detectors to every video of the MED14-EvalSub set and then we rank their scores in descending order. As threshold is selected the 300th score of the ranking list for the 010Ex task and the 100th score for the 100Ex task. Section 6 Training data and knowledge sources a. TRECVID SIN 2014 SIN dataset for training concept detectors