Deep Q-Learning Network with Bayesian-Based Supervised Expert Learning

Deep reinforcement learning (DRL) algorithms interact with the environment and have achieved considerable success in several decision-making problems. However, DRL requires a significant number of data before it can achieve adequate performance. Moreover, it might have limited applicability when DRL...

Full description

Bibliographic Details
Main Author: Chayoung Kim
Format: Article
Language:English
Published: MDPI AG 2022-10-01
Series:Symmetry
Subjects:
Online Access:https://www.mdpi.com/2073-8994/14/10/2134
_version_ 1797469832530100224
author Chayoung Kim
author_facet Chayoung Kim
author_sort Chayoung Kim
collection DOAJ
description Deep reinforcement learning (DRL) algorithms interact with the environment and have achieved considerable success in several decision-making problems. However, DRL requires a significant number of data before it can achieve adequate performance. Moreover, it might have limited applicability when DRL agents are able to learn in a real-world environment. Therefore, some algorithms combine DRL agents with supervised learning and leverage previous additional knowledge. Some have integrated a deep Q-learning network with a behavioral cloning model that can exploit supervised learning as prior learning. The algorithm proposed in this study is also based on these methods and is intended to update the loss function of the existing technique into a Bayesian approach. The supervised loss function used in existing algorithms and the loss function based on the Bayesian method proposed in this study differ in terms of the utilization of prior knowledge. Using prior knowledge and not using prior knowledge, such as the cross entropy being symmetric. As a result of the various OpenAI Gym environments, such as Cart-Pole and MountainCar, the learning convergence performance was improved. In particular, the proposed method can be applied to achieve fairly stable learning during the early stage when learning in a sparse environment is uncertain.
first_indexed 2024-03-09T19:26:39Z
format Article
id doaj.art-398513c9ecae4daaa36623cf93c7d4da
institution Directory Open Access Journal
issn 2073-8994
language English
last_indexed 2024-03-09T19:26:39Z
publishDate 2022-10-01
publisher MDPI AG
record_format Article
series Symmetry
spelling doaj.art-398513c9ecae4daaa36623cf93c7d4da2023-11-24T02:53:00ZengMDPI AGSymmetry2073-89942022-10-011410213410.3390/sym14102134Deep Q-Learning Network with Bayesian-Based Supervised Expert LearningChayoung Kim0College of Liberal Arts and Interdisciplinary Studies, Kyonggi University, 154-42 Gwanggyosan-ro, Yeongtong-gu, Suwon-si 16227, KoreaDeep reinforcement learning (DRL) algorithms interact with the environment and have achieved considerable success in several decision-making problems. However, DRL requires a significant number of data before it can achieve adequate performance. Moreover, it might have limited applicability when DRL agents are able to learn in a real-world environment. Therefore, some algorithms combine DRL agents with supervised learning and leverage previous additional knowledge. Some have integrated a deep Q-learning network with a behavioral cloning model that can exploit supervised learning as prior learning. The algorithm proposed in this study is also based on these methods and is intended to update the loss function of the existing technique into a Bayesian approach. The supervised loss function used in existing algorithms and the loss function based on the Bayesian method proposed in this study differ in terms of the utilization of prior knowledge. Using prior knowledge and not using prior knowledge, such as the cross entropy being symmetric. As a result of the various OpenAI Gym environments, such as Cart-Pole and MountainCar, the learning convergence performance was improved. In particular, the proposed method can be applied to achieve fairly stable learning during the early stage when learning in a sparse environment is uncertain.https://www.mdpi.com/2073-8994/14/10/2134deep reinforcement learningdeep Q-learning networkbehavioral cloning modelexpert supervised learningBayesian approach
spellingShingle Chayoung Kim
Deep Q-Learning Network with Bayesian-Based Supervised Expert Learning
Symmetry
deep reinforcement learning
deep Q-learning network
behavioral cloning model
expert supervised learning
Bayesian approach
title Deep Q-Learning Network with Bayesian-Based Supervised Expert Learning
title_full Deep Q-Learning Network with Bayesian-Based Supervised Expert Learning
title_fullStr Deep Q-Learning Network with Bayesian-Based Supervised Expert Learning
title_full_unstemmed Deep Q-Learning Network with Bayesian-Based Supervised Expert Learning
title_short Deep Q-Learning Network with Bayesian-Based Supervised Expert Learning
title_sort deep q learning network with bayesian based supervised expert learning
topic deep reinforcement learning
deep Q-learning network
behavioral cloning model
expert supervised learning
Bayesian approach
url https://www.mdpi.com/2073-8994/14/10/2134
work_keys_str_mv AT chayoungkim deepqlearningnetworkwithbayesianbasedsupervisedexpertlearning