Deep Q-Learning Network with Bayesian-Based Supervised Expert Learning
Deep reinforcement learning (DRL) algorithms interact with the environment and have achieved considerable success in several decision-making problems. However, DRL requires a significant number of data before it can achieve adequate performance. Moreover, it might have limited applicability when DRL...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2022-10-01
|
Series: | Symmetry |
Subjects: | |
Online Access: | https://www.mdpi.com/2073-8994/14/10/2134 |
_version_ | 1797469832530100224 |
---|---|
author | Chayoung Kim |
author_facet | Chayoung Kim |
author_sort | Chayoung Kim |
collection | DOAJ |
description | Deep reinforcement learning (DRL) algorithms interact with the environment and have achieved considerable success in several decision-making problems. However, DRL requires a significant number of data before it can achieve adequate performance. Moreover, it might have limited applicability when DRL agents are able to learn in a real-world environment. Therefore, some algorithms combine DRL agents with supervised learning and leverage previous additional knowledge. Some have integrated a deep Q-learning network with a behavioral cloning model that can exploit supervised learning as prior learning. The algorithm proposed in this study is also based on these methods and is intended to update the loss function of the existing technique into a Bayesian approach. The supervised loss function used in existing algorithms and the loss function based on the Bayesian method proposed in this study differ in terms of the utilization of prior knowledge. Using prior knowledge and not using prior knowledge, such as the cross entropy being symmetric. As a result of the various OpenAI Gym environments, such as Cart-Pole and MountainCar, the learning convergence performance was improved. In particular, the proposed method can be applied to achieve fairly stable learning during the early stage when learning in a sparse environment is uncertain. |
first_indexed | 2024-03-09T19:26:39Z |
format | Article |
id | doaj.art-398513c9ecae4daaa36623cf93c7d4da |
institution | Directory Open Access Journal |
issn | 2073-8994 |
language | English |
last_indexed | 2024-03-09T19:26:39Z |
publishDate | 2022-10-01 |
publisher | MDPI AG |
record_format | Article |
series | Symmetry |
spelling | doaj.art-398513c9ecae4daaa36623cf93c7d4da2023-11-24T02:53:00ZengMDPI AGSymmetry2073-89942022-10-011410213410.3390/sym14102134Deep Q-Learning Network with Bayesian-Based Supervised Expert LearningChayoung Kim0College of Liberal Arts and Interdisciplinary Studies, Kyonggi University, 154-42 Gwanggyosan-ro, Yeongtong-gu, Suwon-si 16227, KoreaDeep reinforcement learning (DRL) algorithms interact with the environment and have achieved considerable success in several decision-making problems. However, DRL requires a significant number of data before it can achieve adequate performance. Moreover, it might have limited applicability when DRL agents are able to learn in a real-world environment. Therefore, some algorithms combine DRL agents with supervised learning and leverage previous additional knowledge. Some have integrated a deep Q-learning network with a behavioral cloning model that can exploit supervised learning as prior learning. The algorithm proposed in this study is also based on these methods and is intended to update the loss function of the existing technique into a Bayesian approach. The supervised loss function used in existing algorithms and the loss function based on the Bayesian method proposed in this study differ in terms of the utilization of prior knowledge. Using prior knowledge and not using prior knowledge, such as the cross entropy being symmetric. As a result of the various OpenAI Gym environments, such as Cart-Pole and MountainCar, the learning convergence performance was improved. In particular, the proposed method can be applied to achieve fairly stable learning during the early stage when learning in a sparse environment is uncertain.https://www.mdpi.com/2073-8994/14/10/2134deep reinforcement learningdeep Q-learning networkbehavioral cloning modelexpert supervised learningBayesian approach |
spellingShingle | Chayoung Kim Deep Q-Learning Network with Bayesian-Based Supervised Expert Learning Symmetry deep reinforcement learning deep Q-learning network behavioral cloning model expert supervised learning Bayesian approach |
title | Deep Q-Learning Network with Bayesian-Based Supervised Expert Learning |
title_full | Deep Q-Learning Network with Bayesian-Based Supervised Expert Learning |
title_fullStr | Deep Q-Learning Network with Bayesian-Based Supervised Expert Learning |
title_full_unstemmed | Deep Q-Learning Network with Bayesian-Based Supervised Expert Learning |
title_short | Deep Q-Learning Network with Bayesian-Based Supervised Expert Learning |
title_sort | deep q learning network with bayesian based supervised expert learning |
topic | deep reinforcement learning deep Q-learning network behavioral cloning model expert supervised learning Bayesian approach |
url | https://www.mdpi.com/2073-8994/14/10/2134 |
work_keys_str_mv | AT chayoungkim deepqlearningnetworkwithbayesianbasedsupervisedexpertlearning |