A Sample Aggregation Approach to Experiences Replay of Dyna-Q Learning

In a complex environment, the learning efficiency of reinforcement learning methods always decreases due to large-scale or continuous spaces problems, which can cause the well-known curse of dimensionality. To deal with this problem and enhance learning efficiency, this paper introduces an aggregati...

Full description

Bibliographic Details
Main Authors:	Haobin Shi, Shike Yang, Kao-Shing Hwang, Jialin Chen, Mengkai Hu, Hengsheng Zhang
Format:	Article
Language:	English
Published:	IEEE 2018-01-01
Series:	IEEE Access
Subjects:	Dyna-Q Minhash Chinese restaurant process FSA-CRP model prediction
Online Access:	https://ieeexplore.ieee.org/document/8383982/

_version_	1818619260092547072
author	Haobin Shi Shike Yang Kao-Shing Hwang Jialin Chen Mengkai Hu Hengsheng Zhang
author_facet	Haobin Shi Shike Yang Kao-Shing Hwang Jialin Chen Mengkai Hu Hengsheng Zhang
author_sort	Haobin Shi
collection	DOAJ
description	In a complex environment, the learning efficiency of reinforcement learning methods always decreases due to large-scale or continuous spaces problems, which can cause the well-known curse of dimensionality. To deal with this problem and enhance learning efficiency, this paper introduces an aggregation method by using framework of sample aggregation based on Chinese restaurant process (CRP), named FSA-CRP, to cluster experiential samples, which is represented by quadruples of the current state, action, next state, and the obtained reward. In addition, the proposed algorithm applies a similarity estimation method, the MinHash method, to calculate the similarity between samples. Moreover, to improve the learning efficiency, the experience sharing Dyna learning algorithm based on samples/clusters prediction method is proposed. While an agent learns the value function of the current state, it acquires clustering results, the value functions of the sample merge with the original as the updated value function of the cluster. In indirect learning (planning) for the Dyna-Q, a learning agent looks for the most likely branches of the constructed FSA-CRP model to raise up learning efficiency. The most likely branches will be selected by an improved action/sample selection algorithm. The algorithm applies the probability that the sample appears in the cluster to select simulated experiences for indirect learning. To verify the validity and applicability of the proposed method, experiments are conducted on a simulated maze and a cart-pole system. The results demonstrate that the proposed method can effectively accelerate the learning process.
first_indexed	2024-12-16T17:34:39Z
format	Article
id	doaj.art-f9efa5b853be4b558ab6226515f02f70
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-12-16T17:34:39Z
publishDate	2018-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-f9efa5b853be4b558ab6226515f02f702022-12-21T22:22:51ZengIEEEIEEE Access2169-35362018-01-016371733718410.1109/ACCESS.2018.28470488383982A Sample Aggregation Approach to Experiences Replay of Dyna-Q LearningHaobin Shi0https://orcid.org/0000-0003-2180-8941Shike Yang1Kao-Shing Hwang2https://orcid.org/0000-0003-4432-8801Jialin Chen3Mengkai Hu4Hengsheng Zhang5School of Computer Science, Northwestern Polytechnical University, Xi’an, ChinaDepartment of software, Twentieth Research Institute of China Electronic Technology Group Corporation, Xi’an, ChinaDepartment of Electrical Engineering, National Sun Yat-Sen University, Kaohsiung, TaiwanSchool of Computer Science, Northwestern Polytechnical University, Xi’an, ChinaSchool of Electronics Engineering and Computer Science, Peking University, Beijing, ChinaDepartment of software, Twentieth Research Institute of China Electronic Technology Group Corporation, Xi’an, ChinaIn a complex environment, the learning efficiency of reinforcement learning methods always decreases due to large-scale or continuous spaces problems, which can cause the well-known curse of dimensionality. To deal with this problem and enhance learning efficiency, this paper introduces an aggregation method by using framework of sample aggregation based on Chinese restaurant process (CRP), named FSA-CRP, to cluster experiential samples, which is represented by quadruples of the current state, action, next state, and the obtained reward. In addition, the proposed algorithm applies a similarity estimation method, the MinHash method, to calculate the similarity between samples. Moreover, to improve the learning efficiency, the experience sharing Dyna learning algorithm based on samples/clusters prediction method is proposed. While an agent learns the value function of the current state, it acquires clustering results, the value functions of the sample merge with the original as the updated value function of the cluster. In indirect learning (planning) for the Dyna-Q, a learning agent looks for the most likely branches of the constructed FSA-CRP model to raise up learning efficiency. The most likely branches will be selected by an improved action/sample selection algorithm. The algorithm applies the probability that the sample appears in the cluster to select simulated experiences for indirect learning. To verify the validity and applicability of the proposed method, experiments are conducted on a simulated maze and a cart-pole system. The results demonstrate that the proposed method can effectively accelerate the learning process.https://ieeexplore.ieee.org/document/8383982/Dyna-QMinhashChinese restaurant processFSA-CRP modelprediction
spellingShingle	Haobin Shi Shike Yang Kao-Shing Hwang Jialin Chen Mengkai Hu Hengsheng Zhang A Sample Aggregation Approach to Experiences Replay of Dyna-Q Learning IEEE Access Dyna-Q Minhash Chinese restaurant process FSA-CRP model prediction
title	A Sample Aggregation Approach to Experiences Replay of Dyna-Q Learning
title_full	A Sample Aggregation Approach to Experiences Replay of Dyna-Q Learning
title_fullStr	A Sample Aggregation Approach to Experiences Replay of Dyna-Q Learning
title_full_unstemmed	A Sample Aggregation Approach to Experiences Replay of Dyna-Q Learning
title_short	A Sample Aggregation Approach to Experiences Replay of Dyna-Q Learning
title_sort	sample aggregation approach to experiences replay of dyna q learning
topic	Dyna-Q Minhash Chinese restaurant process FSA-CRP model prediction
url	https://ieeexplore.ieee.org/document/8383982/
work_keys_str_mv	AT haobinshi asampleaggregationapproachtoexperiencesreplayofdynaqlearning AT shikeyang asampleaggregationapproachtoexperiencesreplayofdynaqlearning AT kaoshinghwang asampleaggregationapproachtoexperiencesreplayofdynaqlearning AT jialinchen asampleaggregationapproachtoexperiencesreplayofdynaqlearning AT mengkaihu asampleaggregationapproachtoexperiencesreplayofdynaqlearning AT hengshengzhang asampleaggregationapproachtoexperiencesreplayofdynaqlearning AT haobinshi sampleaggregationapproachtoexperiencesreplayofdynaqlearning AT shikeyang sampleaggregationapproachtoexperiencesreplayofdynaqlearning AT kaoshinghwang sampleaggregationapproachtoexperiencesreplayofdynaqlearning AT jialinchen sampleaggregationapproachtoexperiencesreplayofdynaqlearning AT mengkaihu sampleaggregationapproachtoexperiencesreplayofdynaqlearning AT hengshengzhang sampleaggregationapproachtoexperiencesreplayofdynaqlearning

A Sample Aggregation Approach to Experiences Replay of Dyna-Q Learning

Similar Items