Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics

End-to-end dialog systems are gaining interest due to the recent advances of deep neural networks and the availability of large human–human dialog corpora. However, in spite of being of fundamental importance to systematically improve the performance of this kind of systems, automatic evaluation of...

Full description

Bibliographic Details
Main Authors:	D'Haro, Luis Fernando, Banchs, Rafael E., Hori, Chiori, Li, Haizhou
Other Authors:	School of Computer Science and Engineering
Format:	Journal Article
Language:	English
Published:	2021
Subjects:	Engineering::Computer science and engineering Automatic Evaluation Metrics Dialog Systems
Online Access:	https://hdl.handle.net/10356/151218

_version_	1826122842820837376
author	D'Haro, Luis Fernando Banchs, Rafael E. Hori, Chiori Li, Haizhou
author2	School of Computer Science and Engineering
author_facet	School of Computer Science and Engineering D'Haro, Luis Fernando Banchs, Rafael E. Hori, Chiori Li, Haizhou
author_sort	D'Haro, Luis Fernando
collection	NTU
description	End-to-end dialog systems are gaining interest due to the recent advances of deep neural networks and the availability of large human–human dialog corpora. However, in spite of being of fundamental importance to systematically improve the performance of this kind of systems, automatic evaluation of the generated dialog utterances is still an unsolved problem. Indeed, most of the proposed objective metrics shown low correlation with human evaluations. In this paper, we evaluate a two-dimensional evaluation metric that is designed to operate at sentence level, which considers the syntactic and semantic information carried along the answers generated by an end-to-end dialog system with respect to a set of references. The proposed metric, when applied to outputs generated by the systems participating in track 2 of the DSTC-6 challenge, shows a higher correlation with human evaluations (up to 12.8% relative improvement at the system level) than the best of the alternative state-of-the-art automatic metrics currently available.
first_indexed	2024-10-01T05:54:48Z
format	Journal Article
id	ntu-10356/151218
institution	Nanyang Technological University
language	English
last_indexed	2024-10-01T05:54:48Z
publishDate	2021
record_format	dspace
spelling	ntu-10356/1512182021-07-02T03:31:40Z Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics D'Haro, Luis Fernando Banchs, Rafael E. Hori, Chiori Li, Haizhou School of Computer Science and Engineering Engineering::Computer science and engineering Automatic Evaluation Metrics Dialog Systems End-to-end dialog systems are gaining interest due to the recent advances of deep neural networks and the availability of large human–human dialog corpora. However, in spite of being of fundamental importance to systematically improve the performance of this kind of systems, automatic evaluation of the generated dialog utterances is still an unsolved problem. Indeed, most of the proposed objective metrics shown low correlation with human evaluations. In this paper, we evaluate a two-dimensional evaluation metric that is designed to operate at sentence level, which considers the syntactic and semantic information carried along the answers generated by an end-to-end dialog system with respect to a set of references. The proposed metric, when applied to outputs generated by the systems participating in track 2 of the DSTC-6 challenge, shows a higher correlation with human evaluations (up to 12.8% relative improvement at the system level) than the best of the alternative state-of-the-art automatic metrics currently available. 2021-07-02T03:31:40Z 2021-07-02T03:31:40Z 2018 Journal Article D'Haro, L. F., Banchs, R. E., Hori, C. & Li, H. (2018). Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics. Computer Speech and Language, 55, 200-215. https://dx.doi.org/10.1016/j.csl.2018.12.004 0885-2308 0000-0002-4201-7578 https://hdl.handle.net/10356/151218 10.1016/j.csl.2018.12.004 2-s2.0-85059347815 55 200 215 en Computer Speech and Language © 2018 Elsevier Ltd. All rights reserved.
spellingShingle	Engineering::Computer science and engineering Automatic Evaluation Metrics Dialog Systems D'Haro, Luis Fernando Banchs, Rafael E. Hori, Chiori Li, Haizhou Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics
title	Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics
title_full	Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics
title_fullStr	Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics
title_full_unstemmed	Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics
title_short	Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics
title_sort	automatic evaluation of end to end dialog systems with adequacy fluency metrics
topic	Engineering::Computer science and engineering Automatic Evaluation Metrics Dialog Systems
url	https://hdl.handle.net/10356/151218
work_keys_str_mv	AT dharoluisfernando automaticevaluationofendtoenddialogsystemswithadequacyfluencymetrics AT banchsrafaele automaticevaluationofendtoenddialogsystemswithadequacyfluencymetrics AT horichiori automaticevaluationofendtoenddialogsystemswithadequacyfluencymetrics AT lihaizhou automaticevaluationofendtoenddialogsystemswithadequacyfluencymetrics

Automatic evaluation of end-to-end dialog systems with adequacy-fluency metrics

Similar Items