Variational Reward Estimator Bottleneck: Towards Robust Reward Estimator for Multidomain Task-Oriented Dialogue
Despite its significant effectiveness in adversarial training approaches to multidomain task-oriented dialogue systems, adversarial inverse reinforcement learning of the dialogue policy frequently fails to balance the performance of the reward estimator and policy generator. During the optimization...
Main Authors: | , , , , |
---|---|
格式: | 文件 |
语言: | English |
出版: |
MDPI AG
2021-07-01
|
丛编: | Applied Sciences |
主题: | |
在线阅读: | https://www.mdpi.com/2076-3417/11/14/6624 |