A Structured Review of the Validity of BLEU

The BLEU metric has been widely used in NLP for over 15 years to evaluate NLP systems, especially in machine translation and natural language generation. I present a structured review of the evidence on whether BLEU is a valid evaluation technique—in other words, whether BLEU scores correlate with r...

Full description

Bibliographic Details
Main Author: Ehud Reiter
Format: Article
Language:English
Published: The MIT Press 2018-09-01
Series:Computational Linguistics
Online Access:https://www.mitpressjournals.org/doi/pdf/10.1162/coli_a_00322