Cooperative Multi-Agent Reinforcement Learning with Conversation Knowledge for Dialogue Management

Dialogue management plays a vital role in task-oriented dialogue systems, which has become an active area of research in recent years. Despite the promising results brought from deep reinforcement learning, most of the studies need to develop a manual user simulator additionally. To address the time...

Full description

Bibliographic Details
Main Authors: Shuyu Lei, Xiaojie Wang, Caixia Yuan
Format: Article
Language:English
Published: MDPI AG 2020-04-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/10/8/2740
_version_ 1797570558535139328
author Shuyu Lei
Xiaojie Wang
Caixia Yuan
author_facet Shuyu Lei
Xiaojie Wang
Caixia Yuan
author_sort Shuyu Lei
collection DOAJ
description Dialogue management plays a vital role in task-oriented dialogue systems, which has become an active area of research in recent years. Despite the promising results brought from deep reinforcement learning, most of the studies need to develop a manual user simulator additionally. To address the time-consuming development of simulator policy, we propose a multi-agent dialogue model where an end-to-end dialogue manager and a user simulator are optimized simultaneously. Different from prior work, we optimize the two-agents from scratch and apply the reward shaping technology based on adjacency pairs constraints in conversational analysis to speed up learning and to avoid the derivation from normal human-human conversation. In addition, we generalize the one-to-one learning strategy to one-to-many learning strategy, where a dialogue manager can be concurrently optimized with various user simulators, to improve the performance of trained dialogue manager. The experimental results show that one-to-one agents trained with adjacency pairs constraints can converge faster and avoid derivation. In cross-model evaluation with human users involved, the dialogue manager trained in one-to-many strategy achieves the best performance.
first_indexed 2024-03-10T20:27:13Z
format Article
id doaj.art-8410c543d16a49d1aa690bd47f2dae52
institution Directory Open Access Journal
issn 2076-3417
language English
last_indexed 2024-03-10T20:27:13Z
publishDate 2020-04-01
publisher MDPI AG
record_format Article
series Applied Sciences
spelling doaj.art-8410c543d16a49d1aa690bd47f2dae522023-11-19T21:43:09ZengMDPI AGApplied Sciences2076-34172020-04-01108274010.3390/app10082740Cooperative Multi-Agent Reinforcement Learning with Conversation Knowledge for Dialogue ManagementShuyu Lei0Xiaojie Wang1Caixia Yuan2Center for Intelligence of Science and Technology (CIST), Beijing University of Posts and Telecommunications, Beijing 100876, ChinaCenter for Intelligence of Science and Technology (CIST), Beijing University of Posts and Telecommunications, Beijing 100876, ChinaCenter for Intelligence of Science and Technology (CIST), Beijing University of Posts and Telecommunications, Beijing 100876, ChinaDialogue management plays a vital role in task-oriented dialogue systems, which has become an active area of research in recent years. Despite the promising results brought from deep reinforcement learning, most of the studies need to develop a manual user simulator additionally. To address the time-consuming development of simulator policy, we propose a multi-agent dialogue model where an end-to-end dialogue manager and a user simulator are optimized simultaneously. Different from prior work, we optimize the two-agents from scratch and apply the reward shaping technology based on adjacency pairs constraints in conversational analysis to speed up learning and to avoid the derivation from normal human-human conversation. In addition, we generalize the one-to-one learning strategy to one-to-many learning strategy, where a dialogue manager can be concurrently optimized with various user simulators, to improve the performance of trained dialogue manager. The experimental results show that one-to-one agents trained with adjacency pairs constraints can converge faster and avoid derivation. In cross-model evaluation with human users involved, the dialogue manager trained in one-to-many strategy achieves the best performance.https://www.mdpi.com/2076-3417/10/8/2740dialogue managementuser simulationreward shapingconversation knowledgemulti-agent reinforcement learning
spellingShingle Shuyu Lei
Xiaojie Wang
Caixia Yuan
Cooperative Multi-Agent Reinforcement Learning with Conversation Knowledge for Dialogue Management
Applied Sciences
dialogue management
user simulation
reward shaping
conversation knowledge
multi-agent reinforcement learning
title Cooperative Multi-Agent Reinforcement Learning with Conversation Knowledge for Dialogue Management
title_full Cooperative Multi-Agent Reinforcement Learning with Conversation Knowledge for Dialogue Management
title_fullStr Cooperative Multi-Agent Reinforcement Learning with Conversation Knowledge for Dialogue Management
title_full_unstemmed Cooperative Multi-Agent Reinforcement Learning with Conversation Knowledge for Dialogue Management
title_short Cooperative Multi-Agent Reinforcement Learning with Conversation Knowledge for Dialogue Management
title_sort cooperative multi agent reinforcement learning with conversation knowledge for dialogue management
topic dialogue management
user simulation
reward shaping
conversation knowledge
multi-agent reinforcement learning
url https://www.mdpi.com/2076-3417/10/8/2740
work_keys_str_mv AT shuyulei cooperativemultiagentreinforcementlearningwithconversationknowledgefordialoguemanagement
AT xiaojiewang cooperativemultiagentreinforcementlearningwithconversationknowledgefordialoguemanagement
AT caixiayuan cooperativemultiagentreinforcementlearningwithconversationknowledgefordialoguemanagement