Experience Replay Using Transition Sequences

Experience replay is one of the most commonly used approaches to improve the sample efficiency of reinforcement learning algorithms. In this work, we propose an approach to select and replay sequences of transitions in order to accelerate the learning of a reinforcement learning agent in an off-poli...

Full description

Bibliographic Details
Main Authors:	Thommen George Karimpanal, Roland Bouffanais
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2018-06-01
Series:	Frontiers in Neurorobotics
Subjects:	experience replay Q-learning off-policy multi-task reinforcement learning probabilistic policy reuse
Online Access:	https://www.frontiersin.org/article/10.3389/fnbot.2018.00032/full

_version_	1818772767161450496
author	Thommen George Karimpanal Roland Bouffanais
author_facet	Thommen George Karimpanal Roland Bouffanais
author_sort	Thommen George Karimpanal
collection	DOAJ
description	Experience replay is one of the most commonly used approaches to improve the sample efficiency of reinforcement learning algorithms. In this work, we propose an approach to select and replay sequences of transitions in order to accelerate the learning of a reinforcement learning agent in an off-policy setting. In addition to selecting appropriate sequences, we also artificially construct transition sequences using information gathered from previous agent-environment interactions. These sequences, when replayed, allow value function information to trickle down to larger sections of the state/state-action space, thereby making the most of the agent's experience. We demonstrate our approach on modified versions of standard reinforcement learning tasks such as the mountain car and puddle world problems and empirically show that it enables faster, and more accurate learning of value functions as compared to other forms of experience replay. Further, we briefly discuss some of the possible extensions to this work, as well as applications and situations where this approach could be particularly useful.
first_indexed	2024-12-18T10:14:35Z
format	Article
id	doaj.art-fa555cc438294d37bf319c35f1d1a6b6
institution	Directory Open Access Journal
issn	1662-5218
language	English
last_indexed	2024-12-18T10:14:35Z
publishDate	2018-06-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Neurorobotics
spelling	doaj.art-fa555cc438294d37bf319c35f1d1a6b62022-12-21T21:11:21ZengFrontiers Media S.A.Frontiers in Neurorobotics1662-52182018-06-011210.3389/fnbot.2018.00032374990Experience Replay Using Transition SequencesThommen George KarimpanalRoland BouffanaisExperience replay is one of the most commonly used approaches to improve the sample efficiency of reinforcement learning algorithms. In this work, we propose an approach to select and replay sequences of transitions in order to accelerate the learning of a reinforcement learning agent in an off-policy setting. In addition to selecting appropriate sequences, we also artificially construct transition sequences using information gathered from previous agent-environment interactions. These sequences, when replayed, allow value function information to trickle down to larger sections of the state/state-action space, thereby making the most of the agent's experience. We demonstrate our approach on modified versions of standard reinforcement learning tasks such as the mountain car and puddle world problems and empirically show that it enables faster, and more accurate learning of value functions as compared to other forms of experience replay. Further, we briefly discuss some of the possible extensions to this work, as well as applications and situations where this approach could be particularly useful.https://www.frontiersin.org/article/10.3389/fnbot.2018.00032/fullexperience replayQ-learningoff-policymulti-task reinforcement learningprobabilistic policy reuse
spellingShingle	Thommen George Karimpanal Roland Bouffanais Experience Replay Using Transition Sequences Frontiers in Neurorobotics experience replay Q-learning off-policy multi-task reinforcement learning probabilistic policy reuse
title	Experience Replay Using Transition Sequences
title_full	Experience Replay Using Transition Sequences
title_fullStr	Experience Replay Using Transition Sequences
title_full_unstemmed	Experience Replay Using Transition Sequences
title_short	Experience Replay Using Transition Sequences
title_sort	experience replay using transition sequences
topic	experience replay Q-learning off-policy multi-task reinforcement learning probabilistic policy reuse
url	https://www.frontiersin.org/article/10.3389/fnbot.2018.00032/full
work_keys_str_mv	AT thommengeorgekarimpanal experiencereplayusingtransitionsequences AT rolandbouffanais experiencereplayusingtransitionsequences

Experience Replay Using Transition Sequences

Similar Items