Reward estimation with scheduled knowledge distillation for dialogue policy learning

Formulating dialogue policy as a reinforcement learning (RL) task enables a dialogue system to act optimally by interacting with humans. However, typical RL-based methods normally suffer from challenges such as sparse and delayed reward problems. Besides, with user goal unavailable in real scenarios...

Full description

Bibliographic Details
Main Authors:	Junyan Qiu, Haidong Zhang, Yiping Yang
Format:	Article
Language:	English
Published:	Taylor & Francis Group 2023-12-01
Series:	Connection Science
Subjects:	reinforcement learning dialogue policy learning curriculum learning knowledge distillation
Online Access:	http://dx.doi.org/10.1080/09540091.2023.2174078

Internet

http://dx.doi.org/10.1080/09540091.2023.2174078

Reward estimation with scheduled knowledge distillation for dialogue policy learning

Internet

Similar Items