Cooperative Multi-Agent Reinforcement Learning with Conversation Knowledge for Dialogue Management
Dialogue management plays a vital role in task-oriented dialogue systems, which has become an active area of research in recent years. Despite the promising results brought from deep reinforcement learning, most of the studies need to develop a manual user simulator additionally. To address the time...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-04-01
|
Series: | Applied Sciences |
Subjects: | |
Online Access: | https://www.mdpi.com/2076-3417/10/8/2740 |
_version_ | 1797570558535139328 |
---|---|
author | Shuyu Lei Xiaojie Wang Caixia Yuan |
author_facet | Shuyu Lei Xiaojie Wang Caixia Yuan |
author_sort | Shuyu Lei |
collection | DOAJ |
description | Dialogue management plays a vital role in task-oriented dialogue systems, which has become an active area of research in recent years. Despite the promising results brought from deep reinforcement learning, most of the studies need to develop a manual user simulator additionally. To address the time-consuming development of simulator policy, we propose a multi-agent dialogue model where an end-to-end dialogue manager and a user simulator are optimized simultaneously. Different from prior work, we optimize the two-agents from scratch and apply the reward shaping technology based on adjacency pairs constraints in conversational analysis to speed up learning and to avoid the derivation from normal human-human conversation. In addition, we generalize the one-to-one learning strategy to one-to-many learning strategy, where a dialogue manager can be concurrently optimized with various user simulators, to improve the performance of trained dialogue manager. The experimental results show that one-to-one agents trained with adjacency pairs constraints can converge faster and avoid derivation. In cross-model evaluation with human users involved, the dialogue manager trained in one-to-many strategy achieves the best performance. |
first_indexed | 2024-03-10T20:27:13Z |
format | Article |
id | doaj.art-8410c543d16a49d1aa690bd47f2dae52 |
institution | Directory Open Access Journal |
issn | 2076-3417 |
language | English |
last_indexed | 2024-03-10T20:27:13Z |
publishDate | 2020-04-01 |
publisher | MDPI AG |
record_format | Article |
series | Applied Sciences |
spelling | doaj.art-8410c543d16a49d1aa690bd47f2dae522023-11-19T21:43:09ZengMDPI AGApplied Sciences2076-34172020-04-01108274010.3390/app10082740Cooperative Multi-Agent Reinforcement Learning with Conversation Knowledge for Dialogue ManagementShuyu Lei0Xiaojie Wang1Caixia Yuan2Center for Intelligence of Science and Technology (CIST), Beijing University of Posts and Telecommunications, Beijing 100876, ChinaCenter for Intelligence of Science and Technology (CIST), Beijing University of Posts and Telecommunications, Beijing 100876, ChinaCenter for Intelligence of Science and Technology (CIST), Beijing University of Posts and Telecommunications, Beijing 100876, ChinaDialogue management plays a vital role in task-oriented dialogue systems, which has become an active area of research in recent years. Despite the promising results brought from deep reinforcement learning, most of the studies need to develop a manual user simulator additionally. To address the time-consuming development of simulator policy, we propose a multi-agent dialogue model where an end-to-end dialogue manager and a user simulator are optimized simultaneously. Different from prior work, we optimize the two-agents from scratch and apply the reward shaping technology based on adjacency pairs constraints in conversational analysis to speed up learning and to avoid the derivation from normal human-human conversation. In addition, we generalize the one-to-one learning strategy to one-to-many learning strategy, where a dialogue manager can be concurrently optimized with various user simulators, to improve the performance of trained dialogue manager. The experimental results show that one-to-one agents trained with adjacency pairs constraints can converge faster and avoid derivation. In cross-model evaluation with human users involved, the dialogue manager trained in one-to-many strategy achieves the best performance.https://www.mdpi.com/2076-3417/10/8/2740dialogue managementuser simulationreward shapingconversation knowledgemulti-agent reinforcement learning |
spellingShingle | Shuyu Lei Xiaojie Wang Caixia Yuan Cooperative Multi-Agent Reinforcement Learning with Conversation Knowledge for Dialogue Management Applied Sciences dialogue management user simulation reward shaping conversation knowledge multi-agent reinforcement learning |
title | Cooperative Multi-Agent Reinforcement Learning with Conversation Knowledge for Dialogue Management |
title_full | Cooperative Multi-Agent Reinforcement Learning with Conversation Knowledge for Dialogue Management |
title_fullStr | Cooperative Multi-Agent Reinforcement Learning with Conversation Knowledge for Dialogue Management |
title_full_unstemmed | Cooperative Multi-Agent Reinforcement Learning with Conversation Knowledge for Dialogue Management |
title_short | Cooperative Multi-Agent Reinforcement Learning with Conversation Knowledge for Dialogue Management |
title_sort | cooperative multi agent reinforcement learning with conversation knowledge for dialogue management |
topic | dialogue management user simulation reward shaping conversation knowledge multi-agent reinforcement learning |
url | https://www.mdpi.com/2076-3417/10/8/2740 |
work_keys_str_mv | AT shuyulei cooperativemultiagentreinforcementlearningwithconversationknowledgefordialoguemanagement AT xiaojiewang cooperativemultiagentreinforcementlearningwithconversationknowledgefordialoguemanagement AT caixiayuan cooperativemultiagentreinforcementlearningwithconversationknowledgefordialoguemanagement |