Evaluating Attribution in Dialogue Systems: The BEGIN Benchmark

AbstractKnowledge-grounded dialogue systems powered by large language models often generate responses that, while fluent, are not attributable to a relevant source of information. Progress towards models that do not exhibit this issue requires evaluation metrics that can quantify its...

Full description

Bibliographic Details
Main Authors: Nouha Dziri, Hannah Rashkin, Tal Linzen, David Reitter
Format: Article
Language:English
Published: The MIT Press 2022-01-01
Series:Transactions of the Association for Computational Linguistics
Online Access:https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00506/113023/Evaluating-Attribution-in-Dialogue-Systems-The