Video_to_text pilot task evaluation was run on two main metrics:
1- BLEU metric:
mteval-v14/mteval-v14.pl -r tv16.ref.xml -s tv16.src.xml -t runSubmissionFile.xml -d 3 --metricsMATR

Where: 
mteval : is the NIST MT evaluation tool: ftp://jaguar.ncsl.nist.gov/mt/resources/mteval-v14.pl 

tv16.ref.xml : is the reference (GT) textual description in xml format compatible with the mteval tool

tv16.src.xml : is the source language to be translated. In the VTT task case there is no source language. And thus the contents is
the same as the tv16.ref.xml. However mteval tool uses this file just to extract the number of source sentences (videos) to be translated.

runSubmissionFile : is the run submission file in xml format compatible with the mteval tool.

-d 3 : determines the output files (document/video level, system level, detail scores level). using -d 3 will produce 3 files -doc.scr, -sys.scr and scores file.

Also, the tool produces results for a specific NIST metric which is not bounded and as a result is not very popular.
It is a supported metric that addresses some shortcomings in the BLEU metric. However, we didn't
report any of the NIST metrics as official results at this point.


2- METEOR metric:
java -Xmx2G -jar meteor-*.jar runSubmissionFile.txt tv16.ref.meteor -l en -norm -r 2 -t adq

Where:

runSubmissionFile : is the run submission file in a meteor tool compatible format.

tv16.ref.meteor : is the reference (GT) in txt format compatible with the meteor tool

The tool can be downloaded from http://www.cs.cmu.edu/~alavie/METEOR/


3- STS experimental metric (Semantic Similarity):

wget "http://swoogle.umbc.edu/StsService/GetStsSim?operation=api&phrase1=$s1&phrase2=$s2" --wait=5 --user-agent='NIST - TRECVID' -q -O - 2>&1`

where:
$s1 : a run submitted textual description sentence
$s2 : the GT textual description sentence