Variational Reward Estimator Bottleneck: Towards Robust Reward Estimator for Multidomain Task-Oriented Dialogue

Despite its significant effectiveness in adversarial training approaches to multidomain task-oriented dialogue systems, adversarial inverse reinforcement learning of the dialogue policy frequently fails to balance the performance of the reward estimator and policy generator. During the optimization...

全面介绍

书目详细资料
Main Authors: Jeiyoon Park, Chanhee Lee, Chanjun Park, Kuekyeng Kim, Heuiseok Lim
格式: 文件
语言:English
出版: MDPI AG 2021-07-01
丛编:Applied Sciences
主题:
在线阅读:https://www.mdpi.com/2076-3417/11/14/6624