------------------------------------- Section 1 System Description ------------------------------------- 000Ex System: Our 000Ex system takes an Information Retrieval approach to solving the problem of finding videos without any exemplars. We approached the text modalities using language modeling, indexing the OCR and ASR data using Krovetz stemming. We manually edited the event descriptions to remove vague or negative language, and then automatically processed the results using the Lemur 418 word stoplist. By then breaking the queries into fields, which were considered under the sequential dependence model (SDM) with Dirichlet smoothing customized for each modality, a final query was formed from a linear combination of these fields. For each video concept detector, we built a language model using the top results from the ClueWeb ‘09-B corpus. By representing each concept with text, we were able to map from our input query into video concepts without using any exemplars, and treat the concept detections as if they were words under a query-likelihood model. 010Ex System: Our 010Ex system employed the following low-level features: DenseSIFT HessianAffine ColorSift TCH DenseTrack_hog DenseTrack_mbh CMUAudio It also includes the following high-level features: DeepCaffe OverFeat Action Concepts ASR/OCR features For low-level features, we employ SVM with fast histogram intersection classifiers. For high-level features, we employ SVM with RBF Kernel. 100Ex System: Our 100Ex system employed the same set of low-level features as our 010Ex system, but with different representations: DenseSIFT_fisher DenseTrack_hog_fisher DenseTrack_mbh_fisher ColorSift_fisher DenseSIFT_stcoding HessianAffine_stcoding ColorSift_stcoding TCH_stcoding DenseTrack_hog_stcoding DenseTrack_mbh_stcoding CMUAudio It also includes the following high-level features, which is same to 010Ex system: DeepCaffe OverFeat Action Concepts For fisher features, we directly employ linearSVM, and for stcoding features, we use FeatureMapping technique and linearSVM. For high-level features, we employ SVM with RBF Kernel. Contact Information: Hui Cheng hui.cheng@sri.com Jingen Liu jingen.liu@sri.com Gary Gan chuanyong.gan@sri.com ------------------------------------- Section 2 Training data and knowledge sources ------------------------------------- We employ NIST SIN training dataset to train our SIN detectors; We also employ ImageNet to training our 1000 OverFeat and DeepCaffe concept features; For action concept detectors, we trained them use the third-party open source videos. ------------------------------------- Section 3 References ------------------------------------- 1. A. Vedaldi and B. Fulkerson, VLFeat: An Open and Portable Library of Computer Vision Algorithms 2. A. Vedaldi and A. Zisserman, Sparse Kernel Approximations for Efficient Classification and Detection, CVPR12 3. http://www.csie.ntu.edu.tw/~cjlin/liblinear/ 4. http://www.csie.ntu.edu.tw/~cjlin/libsvm/