Deep Q-Learning Network with Bayesian-Based Supervised Expert Learning

Deep reinforcement learning (DRL) algorithms interact with the environment and have achieved considerable success in several decision-making problems. However, DRL requires a significant number of data before it can achieve adequate performance. Moreover, it might have limited applicability when DRL...

Full description

Bibliographic Details
Main Author:	Chayoung Kim
Format:	Article
Language:	English
Published:	MDPI AG 2022-10-01
Series:	Symmetry
Subjects:	deep reinforcement learning deep Q-learning network behavioral cloning model expert supervised learning Bayesian approach
Online Access:	https://www.mdpi.com/2073-8994/14/10/2134

_version_	1797469832530100224
author	Chayoung Kim
author_facet	Chayoung Kim
author_sort	Chayoung Kim
collection	DOAJ
description	Deep reinforcement learning (DRL) algorithms interact with the environment and have achieved considerable success in several decision-making problems. However, DRL requires a significant number of data before it can achieve adequate performance. Moreover, it might have limited applicability when DRL agents are able to learn in a real-world environment. Therefore, some algorithms combine DRL agents with supervised learning and leverage previous additional knowledge. Some have integrated a deep Q-learning network with a behavioral cloning model that can exploit supervised learning as prior learning. The algorithm proposed in this study is also based on these methods and is intended to update the loss function of the existing technique into a Bayesian approach. The supervised loss function used in existing algorithms and the loss function based on the Bayesian method proposed in this study differ in terms of the utilization of prior knowledge. Using prior knowledge and not using prior knowledge, such as the cross entropy being symmetric. As a result of the various OpenAI Gym environments, such as Cart-Pole and MountainCar, the learning convergence performance was improved. In particular, the proposed method can be applied to achieve fairly stable learning during the early stage when learning in a sparse environment is uncertain.
first_indexed	2024-03-09T19:26:39Z
format	Article
id	doaj.art-398513c9ecae4daaa36623cf93c7d4da
institution	Directory Open Access Journal
issn	2073-8994
language	English
last_indexed	2024-03-09T19:26:39Z
publishDate	2022-10-01
publisher	MDPI AG
record_format	Article
series	Symmetry
spelling	doaj.art-398513c9ecae4daaa36623cf93c7d4da2023-11-24T02:53:00ZengMDPI AGSymmetry2073-89942022-10-011410213410.3390/sym14102134Deep Q-Learning Network with Bayesian-Based Supervised Expert LearningChayoung Kim0College of Liberal Arts and Interdisciplinary Studies, Kyonggi University, 154-42 Gwanggyosan-ro, Yeongtong-gu, Suwon-si 16227, KoreaDeep reinforcement learning (DRL) algorithms interact with the environment and have achieved considerable success in several decision-making problems. However, DRL requires a significant number of data before it can achieve adequate performance. Moreover, it might have limited applicability when DRL agents are able to learn in a real-world environment. Therefore, some algorithms combine DRL agents with supervised learning and leverage previous additional knowledge. Some have integrated a deep Q-learning network with a behavioral cloning model that can exploit supervised learning as prior learning. The algorithm proposed in this study is also based on these methods and is intended to update the loss function of the existing technique into a Bayesian approach. The supervised loss function used in existing algorithms and the loss function based on the Bayesian method proposed in this study differ in terms of the utilization of prior knowledge. Using prior knowledge and not using prior knowledge, such as the cross entropy being symmetric. As a result of the various OpenAI Gym environments, such as Cart-Pole and MountainCar, the learning convergence performance was improved. In particular, the proposed method can be applied to achieve fairly stable learning during the early stage when learning in a sparse environment is uncertain.https://www.mdpi.com/2073-8994/14/10/2134deep reinforcement learningdeep Q-learning networkbehavioral cloning modelexpert supervised learningBayesian approach
spellingShingle	Chayoung Kim Deep Q-Learning Network with Bayesian-Based Supervised Expert Learning Symmetry deep reinforcement learning deep Q-learning network behavioral cloning model expert supervised learning Bayesian approach
title	Deep Q-Learning Network with Bayesian-Based Supervised Expert Learning
title_full	Deep Q-Learning Network with Bayesian-Based Supervised Expert Learning
title_fullStr	Deep Q-Learning Network with Bayesian-Based Supervised Expert Learning
title_full_unstemmed	Deep Q-Learning Network with Bayesian-Based Supervised Expert Learning
title_short	Deep Q-Learning Network with Bayesian-Based Supervised Expert Learning
title_sort	deep q learning network with bayesian based supervised expert learning
topic	deep reinforcement learning deep Q-learning network behavioral cloning model expert supervised learning Bayesian approach
url	https://www.mdpi.com/2073-8994/14/10/2134
work_keys_str_mv	AT chayoungkim deepqlearningnetworkwithbayesianbasedsupervisedexpertlearning

Deep Q-Learning Network with Bayesian-Based Supervised Expert Learning

Similar Items