Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning

In this paper, we present a new intrinsically motivated actor-critic algorithm for learning continuous motor skills directly from raw visual input. Our neural architecture is composed of a critic and an actor network. Both networks receive the hidden representation of a deep convolutional autoencode...

Full description

Bibliographic Details
Main Authors: Hafez Muhammad Burhan, Weber Cornelius, Kerzel Matthias, Wermter Stefan
Format: Article
Language:English
Published: De Gruyter 2019-01-01
Series:Paladyn
Subjects:
Online Access:https://doi.org/10.1515/pjbr-2019-0005
_version_ 1797428727546642432
author Hafez Muhammad Burhan
Weber Cornelius
Kerzel Matthias
Wermter Stefan
author_facet Hafez Muhammad Burhan
Weber Cornelius
Kerzel Matthias
Wermter Stefan
author_sort Hafez Muhammad Burhan
collection DOAJ
description In this paper, we present a new intrinsically motivated actor-critic algorithm for learning continuous motor skills directly from raw visual input. Our neural architecture is composed of a critic and an actor network. Both networks receive the hidden representation of a deep convolutional autoencoder which is trained to reconstruct the visual input, while the centre-most hidden representation is also optimized to estimate the state value. Separately, an ensemble of predictive world models generates, based on its learning progress, an intrinsic reward signal which is combined with the extrinsic reward to guide the exploration of the actor-critic learner. Our approach is more data-efficient and inherently more stable than the existing actor-critic methods for continuous control from pixel data. We evaluate our algorithm for the task of learning robotic reaching and grasping skills on a realistic physics simulator and on a humanoid robot. The results show that the control policies learned with our approach can achieve better performance than the compared state-of-the-art and baseline algorithms in both dense-reward and challenging sparse-reward settings.
first_indexed 2024-03-09T09:03:03Z
format Article
id doaj.art-4ae7b9351faf4e3c984664aeda28b721
institution Directory Open Access Journal
issn 2081-4836
language English
last_indexed 2024-03-09T09:03:03Z
publishDate 2019-01-01
publisher De Gruyter
record_format Article
series Paladyn
spelling doaj.art-4ae7b9351faf4e3c984664aeda28b7212023-12-02T11:06:10ZengDe GruyterPaladyn2081-48362019-01-01101142910.1515/pjbr-2019-0005pjbr-2019-0005Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learningHafez Muhammad Burhan0Weber Cornelius1Kerzel Matthias2Wermter Stefan3Department of Informatics, University ofHamburg, GermanyDepartment of Informatics, University ofHamburg, GermanyDepartment of Informatics, University ofHamburg, GermanyDepartment of Informatics, University ofHamburg, GermanyIn this paper, we present a new intrinsically motivated actor-critic algorithm for learning continuous motor skills directly from raw visual input. Our neural architecture is composed of a critic and an actor network. Both networks receive the hidden representation of a deep convolutional autoencoder which is trained to reconstruct the visual input, while the centre-most hidden representation is also optimized to estimate the state value. Separately, an ensemble of predictive world models generates, based on its learning progress, an intrinsic reward signal which is combined with the extrinsic reward to guide the exploration of the actor-critic learner. Our approach is more data-efficient and inherently more stable than the existing actor-critic methods for continuous control from pixel data. We evaluate our algorithm for the task of learning robotic reaching and grasping skills on a realistic physics simulator and on a humanoid robot. The results show that the control policies learned with our approach can achieve better performance than the compared state-of-the-art and baseline algorithms in both dense-reward and challenging sparse-reward settings.https://doi.org/10.1515/pjbr-2019-0005deep reinforcement learningactor-criticcontinuous controlefficient explorationneuro-robotics
spellingShingle Hafez Muhammad Burhan
Weber Cornelius
Kerzel Matthias
Wermter Stefan
Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning
Paladyn
deep reinforcement learning
actor-critic
continuous control
efficient exploration
neuro-robotics
title Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning
title_full Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning
title_fullStr Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning
title_full_unstemmed Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning
title_short Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning
title_sort deep intrinsically motivated continuous actor critic for efficient robotic visuomotor skill learning
topic deep reinforcement learning
actor-critic
continuous control
efficient exploration
neuro-robotics
url https://doi.org/10.1515/pjbr-2019-0005
work_keys_str_mv AT hafezmuhammadburhan deepintrinsicallymotivatedcontinuousactorcriticforefficientroboticvisuomotorskilllearning
AT webercornelius deepintrinsicallymotivatedcontinuousactorcriticforefficientroboticvisuomotorskilllearning
AT kerzelmatthias deepintrinsicallymotivatedcontinuousactorcriticforefficientroboticvisuomotorskilllearning
AT wermterstefan deepintrinsicallymotivatedcontinuousactorcriticforefficientroboticvisuomotorskilllearning