Variational Reward Estimator Bottleneck: Towards Robust Reward Estimator for Multidomain Task-Oriented Dialogue
Despite its significant effectiveness in adversarial training approaches to multidomain task-oriented dialogue systems, adversarial inverse reinforcement learning of the dialogue policy frequently fails to balance the performance of the reward estimator and policy generator. During the optimization...
Príomhchruthaitheoirí: | , , , , |
---|---|
Formáid: | Alt |
Teanga: | English |
Foilsithe / Cruthaithe: |
MDPI AG
2021-07-01
|
Sraith: | Applied Sciences |
Ábhair: | |
Rochtain ar líne: | https://www.mdpi.com/2076-3417/11/14/6624 |