Latent Structure Matching for Knowledge Transfer in Reinforcement Learning

Reinforcement learning algorithms usually require a large number of empirical samples and give rise to a slow convergence in practical applications. One solution is to introduce transfer learning: Knowledge from well-learned source tasks can be reused to reduce sample request and accelerate the lear...

Full description

Bibliographic Details
Main Authors:	Yi Zhou, Fenglei Yang
Format:	Article
Language:	English
Published:	MDPI AG 2020-02-01
Series:	Future Internet
Subjects:	latent structure matching reinforcement learning transfer learning action advice policy improvement mountain car
Online Access:	https://www.mdpi.com/1999-5903/12/2/36

_version_	1828287904390053888
author	Yi Zhou Fenglei Yang
author_facet	Yi Zhou Fenglei Yang
author_sort	Yi Zhou
collection	DOAJ
description	Reinforcement learning algorithms usually require a large number of empirical samples and give rise to a slow convergence in practical applications. One solution is to introduce transfer learning: Knowledge from well-learned source tasks can be reused to reduce sample request and accelerate the learning of target tasks. However, if an unmatched source task is selected, it will slow down or even disrupt the learning procedure. Therefore, it is very important for knowledge transfer to select appropriate source tasks that have a high degree of matching with target tasks. In this paper, a novel task matching algorithm is proposed to derive the latent structures of value functions of tasks, and align the structures for similarity estimation. Through the latent structure matching, the highly-matched source tasks are selected effectively, from which knowledge is then transferred to give action advice, and improve exploration strategies of the target tasks. Experiments are conducted on the simulated navigation environment and the mountain car environment. The results illustrate the significant performance gain of the improved exploration strategy, compared with traditional <inline-formula> <math display="inline"> <semantics> <mi>ϵ</mi> </semantics> </math> </inline-formula>-greedy exploration strategy. A theoretical proof is also given to verify the improvement of the exploration strategy based on latent structure matching.
first_indexed	2024-04-13T09:55:56Z
format	Article
id	doaj.art-be5c51b910654e0fa3603f81e237810f
institution	Directory Open Access Journal
issn	1999-5903
language	English
last_indexed	2024-04-13T09:55:56Z
publishDate	2020-02-01
publisher	MDPI AG
record_format	Article
series	Future Internet
spelling	doaj.art-be5c51b910654e0fa3603f81e237810f2022-12-22T02:51:25ZengMDPI AGFuture Internet1999-59032020-02-011223610.3390/fi12020036fi12020036Latent Structure Matching for Knowledge Transfer in Reinforcement LearningYi Zhou0Fenglei Yang1School of Computer Engineering and Science, Shanghai University, Shanghai 200444, ChinaSchool of Computer Engineering and Science, Shanghai University, Shanghai 200444, ChinaReinforcement learning algorithms usually require a large number of empirical samples and give rise to a slow convergence in practical applications. One solution is to introduce transfer learning: Knowledge from well-learned source tasks can be reused to reduce sample request and accelerate the learning of target tasks. However, if an unmatched source task is selected, it will slow down or even disrupt the learning procedure. Therefore, it is very important for knowledge transfer to select appropriate source tasks that have a high degree of matching with target tasks. In this paper, a novel task matching algorithm is proposed to derive the latent structures of value functions of tasks, and align the structures for similarity estimation. Through the latent structure matching, the highly-matched source tasks are selected effectively, from which knowledge is then transferred to give action advice, and improve exploration strategies of the target tasks. Experiments are conducted on the simulated navigation environment and the mountain car environment. The results illustrate the significant performance gain of the improved exploration strategy, compared with traditional <inline-formula> <math display="inline"> <semantics> <mi>ϵ</mi> </semantics> </math> </inline-formula>-greedy exploration strategy. A theoretical proof is also given to verify the improvement of the exploration strategy based on latent structure matching.https://www.mdpi.com/1999-5903/12/2/36latent structure matchingreinforcement learningtransfer learningaction advicepolicy improvementmountain car
spellingShingle	Yi Zhou Fenglei Yang Latent Structure Matching for Knowledge Transfer in Reinforcement Learning Future Internet latent structure matching reinforcement learning transfer learning action advice policy improvement mountain car
title	Latent Structure Matching for Knowledge Transfer in Reinforcement Learning
title_full	Latent Structure Matching for Knowledge Transfer in Reinforcement Learning
title_fullStr	Latent Structure Matching for Knowledge Transfer in Reinforcement Learning
title_full_unstemmed	Latent Structure Matching for Knowledge Transfer in Reinforcement Learning
title_short	Latent Structure Matching for Knowledge Transfer in Reinforcement Learning
title_sort	latent structure matching for knowledge transfer in reinforcement learning
topic	latent structure matching reinforcement learning transfer learning action advice policy improvement mountain car
url	https://www.mdpi.com/1999-5903/12/2/36
work_keys_str_mv	AT yizhou latentstructurematchingforknowledgetransferinreinforcementlearning AT fengleiyang latentstructurematchingforknowledgetransferinreinforcementlearning

Latent Structure Matching for Knowledge Transfer in Reinforcement Learning

Similar Items