The 2020 Video_to_text task evaluation was run on five main metrics:

1- BLEU metric:
mteval-v14/mteval-v14.pl -r tv20.ref.5.bleu.xml -s tv20.src.5.bleu.xml -t runSubmissionFile.xml -d 3 --metricsMATR

Where: 
mteval : is the NIST MT evaluation tool: ftp://jaguar.ncsl.nist.gov/mt/resources/mteval-v14.pl 

tv20.ref.5.bleu.xml : is the reference (GT) textual description in xml format compatible with the mteval tool using 5 references per video.

tv20.src.5.bleu.xml : is the source language to be translated. In the VTT task case there is no source language. And thus the contents is
the same as the tv20.ref.5.bleu.xml. However mteval tool uses this file just to extract the number of source sentences (videos) to be translated.

runSubmissionFile : is the run submission file in xml format compatible with the mteval tool.

-d 3 : determines the output files (document/video level, system level, detail scores level). using -d 3 will produce 3 files -doc.scr, -sys.scr and scores file.

Also, the tool produces results for a specific NIST metric which is not bounded and as a result is not very popular.
It is a supported metric that addresses some shortcomings in the BLEU metric. However, we didn't
report any of the NIST metrics as official results at this point.

2- METEOR metric:
java -Xmx2G -jar meteor-*.jar runSubmissionFile.txt tv20.ref.5.meteor -l en -norm -r 5 -t adq

Where:

runSubmissionFile : is the run submission file in a meteor tool compatible format.

tv20.ref.5.meteor : is the reference (GT) in txt format compatible with the meteor tool using 5 references (G.T).
The parameter -r decides how many references are being used, which is 5 for the 2020 VTT task.

The tool can be downloaded from http://www.cs.cmu.edu/~alavie/METEOR/


3- STS experimental metric (Semantic Similarity):

To score a similarity between 2 textual descriptions, please use this API call.
wget "http://swoogle.umbc.edu/StsService/GetStsSim?operation=api&phrase1=$s1&phrase2=$s2" --wait=5 --user-agent='NIST - TRECVID' -q -O - 2>&1`

where:
$s1 : a run submitted textual description sentence for a single video.
$s2 : the GT textual description sentence.


4- CIDEr metric: 

  - Please download the CIDEr code available at https://github.com/vrama91/cider.
  - The file cidereval.py evaluates the submission with reference to the provided groundtruth. 
  - The code uses the params.json file to determine the input and output files.
    Edit the params.json file to make "refName" as "vtt.gt.cider.json" and "candName" as your JSON submission filename. 
    Please ensure that "idf" is set to "corpus".

    For further information on how to run the sample code, please refer to README.md.

5 - SPICE metric:

The SPICE code can be downloaded at: https://panderson.me/spice/
The code can be run as follows:
java -Xmx8G -jar spice-1.0.jar runSubmissionFile.json -out outputFile.json -cache cache_dir

Where:
The runSubmissionFile.json is the run submission file in the SPICE tool compatible format. Each submission file includes the ground truth against which the captions are being evaluated.
The cache_dir is initially an empty directory, where the reference caption parsing is stored for faster evaluation if multiple runs are being evaluated. The cache parameter is optional.