Variational Reward Estimator Bottleneck: Towards Robust Reward Estimator for Multidomain Task-Oriented Dialogue

Despite its significant effectiveness in adversarial training approaches to multidomain task-oriented dialogue systems, adversarial inverse reinforcement learning of the dialogue policy frequently fails to balance the performance of the reward estimator and policy generator. During the optimization...

Cur síos iomlán

Sonraí bibleagrafaíochta
Príomhchruthaitheoirí: Jeiyoon Park, Chanhee Lee, Chanjun Park, Kuekyeng Kim, Heuiseok Lim
Formáid: Alt
Teanga:English
Foilsithe / Cruthaithe: MDPI AG 2021-07-01
Sraith:Applied Sciences
Ábhair:
Rochtain ar líne:https://www.mdpi.com/2076-3417/11/14/6624