Experience Replay Using Transition Sequences

Experience replay is one of the most commonly used approaches to improve the sample efficiency of reinforcement learning algorithms. In this work, we propose an approach to select and replay sequences of transitions in order to accelerate the learning of a reinforcement learning agent in an off-poli...

Full description

Bibliographic Details
Main Authors: Thommen George Karimpanal, Roland Bouffanais
Format: Article
Language:English
Published: Frontiers Media S.A. 2018-06-01
Series:Frontiers in Neurorobotics
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fnbot.2018.00032/full
_version_ 1818772767161450496
author Thommen George Karimpanal
Roland Bouffanais
author_facet Thommen George Karimpanal
Roland Bouffanais
author_sort Thommen George Karimpanal
collection DOAJ
description Experience replay is one of the most commonly used approaches to improve the sample efficiency of reinforcement learning algorithms. In this work, we propose an approach to select and replay sequences of transitions in order to accelerate the learning of a reinforcement learning agent in an off-policy setting. In addition to selecting appropriate sequences, we also artificially construct transition sequences using information gathered from previous agent-environment interactions. These sequences, when replayed, allow value function information to trickle down to larger sections of the state/state-action space, thereby making the most of the agent's experience. We demonstrate our approach on modified versions of standard reinforcement learning tasks such as the mountain car and puddle world problems and empirically show that it enables faster, and more accurate learning of value functions as compared to other forms of experience replay. Further, we briefly discuss some of the possible extensions to this work, as well as applications and situations where this approach could be particularly useful.
first_indexed 2024-12-18T10:14:35Z
format Article
id doaj.art-fa555cc438294d37bf319c35f1d1a6b6
institution Directory Open Access Journal
issn 1662-5218
language English
last_indexed 2024-12-18T10:14:35Z
publishDate 2018-06-01
publisher Frontiers Media S.A.
record_format Article
series Frontiers in Neurorobotics
spelling doaj.art-fa555cc438294d37bf319c35f1d1a6b62022-12-21T21:11:21ZengFrontiers Media S.A.Frontiers in Neurorobotics1662-52182018-06-011210.3389/fnbot.2018.00032374990Experience Replay Using Transition SequencesThommen George KarimpanalRoland BouffanaisExperience replay is one of the most commonly used approaches to improve the sample efficiency of reinforcement learning algorithms. In this work, we propose an approach to select and replay sequences of transitions in order to accelerate the learning of a reinforcement learning agent in an off-policy setting. In addition to selecting appropriate sequences, we also artificially construct transition sequences using information gathered from previous agent-environment interactions. These sequences, when replayed, allow value function information to trickle down to larger sections of the state/state-action space, thereby making the most of the agent's experience. We demonstrate our approach on modified versions of standard reinforcement learning tasks such as the mountain car and puddle world problems and empirically show that it enables faster, and more accurate learning of value functions as compared to other forms of experience replay. Further, we briefly discuss some of the possible extensions to this work, as well as applications and situations where this approach could be particularly useful.https://www.frontiersin.org/article/10.3389/fnbot.2018.00032/fullexperience replayQ-learningoff-policymulti-task reinforcement learningprobabilistic policy reuse
spellingShingle Thommen George Karimpanal
Roland Bouffanais
Experience Replay Using Transition Sequences
Frontiers in Neurorobotics
experience replay
Q-learning
off-policy
multi-task reinforcement learning
probabilistic policy reuse
title Experience Replay Using Transition Sequences
title_full Experience Replay Using Transition Sequences
title_fullStr Experience Replay Using Transition Sequences
title_full_unstemmed Experience Replay Using Transition Sequences
title_short Experience Replay Using Transition Sequences
title_sort experience replay using transition sequences
topic experience replay
Q-learning
off-policy
multi-task reinforcement learning
probabilistic policy reuse
url https://www.frontiersin.org/article/10.3389/fnbot.2018.00032/full
work_keys_str_mv AT thommengeorgekarimpanal experiencereplayusingtransitionsequences
AT rolandbouffanais experiencereplayusingtransitionsequences