-------------------------------------
Section 1  System Description 
-------------------------------------
000Ex System:
Our 000Ex system takes an Information Retrieval approach to solving the problem of finding videos without any exemplars. We approached the text modalities using language modeling, indexing the OCR and ASR data using Krovetz stemming. We manually edited the event descriptions to remove vague or negative language, and then automatically processed the results using the Lemur 418 word stoplist.
By then breaking the queries into fields, which were considered under the sequential dependence model (SDM) with Dirichlet smoothing customized for each modality, a final query was formed from a linear combination of these fields. For each video concept detector, we built a language model using the top results from the ClueWeb ‘09-B corpus.
By representing each concept with text, we were able to map from our input query into video concepts without using any exemplars, and treat the concept detections as if they were words under a query-likelihood model.

010Ex System:
Our 010Ex system employed the following low-level features:
DenseSIFT
HessianAffine
ColorSift
TCH
DenseTrack_hog
DenseTrack_mbh
CMUAudio
It also includes the following high-level features:
DeepCaffe
OverFeat
Action Concepts
ASR/OCR features
For low-level features, we employ SVM with fast histogram intersection classifiers. For high-level features, we employ SVM with RBF Kernel.

100Ex System:
Our 100Ex system employed the same set of low-level features as our 010Ex system, but with different representations:
DenseSIFT_fisher
DenseTrack_hog_fisher
DenseTrack_mbh_fisher
ColorSift_fisher
DenseSIFT_stcoding
HessianAffine_stcoding
ColorSift_stcoding
TCH_stcoding
DenseTrack_hog_stcoding
DenseTrack_mbh_stcoding
CMUAudio
It also includes the following high-level features, which is same to 010Ex system:
DeepCaffe
OverFeat
Action Concepts
For fisher features, we directly employ linearSVM, and for stcoding features, we use FeatureMapping technique and linearSVM.
For high-level features, we employ SVM with RBF Kernel.

Contact Information:
Hui Cheng
hui.cheng@sri.com
Jingen Liu
jingen.liu@sri.com
Gary Gan
chuanyong.gan@sri.com


-------------------------------------
Section 2  Training data and knowledge sources 
-------------------------------------
We employ NIST SIN training dataset to train our SIN detectors;
We also employ ImageNet to training our 1000 OverFeat and DeepCaffe concept features;
For action concept detectors, we trained them use the third-party open source videos.

-------------------------------------
Section 3  References 
-------------------------------------
1. A. Vedaldi and B. Fulkerson, VLFeat: An Open and Portable Library of Computer Vision Algorithms
2. A. Vedaldi and A. Zisserman, Sparse Kernel Approximations for Efficient Classification and Detection, CVPR12
3. http://www.csie.ntu.edu.tw/~cjlin/liblinear/
4. http://www.csie.ntu.edu.tw/~cjlin/libsvm/