A Structured Review of the Validity of BLEU
The BLEU metric has been widely used in NLP for over 15 years to evaluate NLP systems, especially in machine translation and natural language generation. I present a structured review of the evidence on whether BLEU is a valid evaluation technique—in other words, whether BLEU scores correlate with r...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
The MIT Press
2018-09-01
|
Series: | Computational Linguistics |
Online Access: | https://www.mitpressjournals.org/doi/pdf/10.1162/coli_a_00322 |