Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning
In this paper, we present a new intrinsically motivated actor-critic algorithm for learning continuous motor skills directly from raw visual input. Our neural architecture is composed of a critic and an actor network. Both networks receive the hidden representation of a deep convolutional autoencode...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
De Gruyter
2019-01-01
|
Series: | Paladyn |
Subjects: | |
Online Access: | https://doi.org/10.1515/pjbr-2019-0005 |
_version_ | 1797428727546642432 |
---|---|
author | Hafez Muhammad Burhan Weber Cornelius Kerzel Matthias Wermter Stefan |
author_facet | Hafez Muhammad Burhan Weber Cornelius Kerzel Matthias Wermter Stefan |
author_sort | Hafez Muhammad Burhan |
collection | DOAJ |
description | In this paper, we present a new intrinsically motivated actor-critic algorithm for learning continuous motor skills directly from raw visual input. Our neural architecture is composed of a critic and an actor network. Both networks receive the hidden representation of a deep convolutional autoencoder which is trained to reconstruct the visual input, while the centre-most hidden representation is also optimized to estimate the state value. Separately, an ensemble of predictive world models generates, based on its learning progress, an intrinsic reward signal which is combined with the extrinsic reward to guide the exploration of the actor-critic learner. Our approach is more data-efficient and inherently more stable than the existing actor-critic methods for continuous control from pixel data. We evaluate our algorithm for the task of learning robotic reaching and grasping skills on a realistic physics simulator and on a humanoid robot. The results show that the control policies learned with our approach can achieve better performance than the compared state-of-the-art and baseline algorithms in both dense-reward and challenging sparse-reward settings. |
first_indexed | 2024-03-09T09:03:03Z |
format | Article |
id | doaj.art-4ae7b9351faf4e3c984664aeda28b721 |
institution | Directory Open Access Journal |
issn | 2081-4836 |
language | English |
last_indexed | 2024-03-09T09:03:03Z |
publishDate | 2019-01-01 |
publisher | De Gruyter |
record_format | Article |
series | Paladyn |
spelling | doaj.art-4ae7b9351faf4e3c984664aeda28b7212023-12-02T11:06:10ZengDe GruyterPaladyn2081-48362019-01-01101142910.1515/pjbr-2019-0005pjbr-2019-0005Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learningHafez Muhammad Burhan0Weber Cornelius1Kerzel Matthias2Wermter Stefan3Department of Informatics, University ofHamburg, GermanyDepartment of Informatics, University ofHamburg, GermanyDepartment of Informatics, University ofHamburg, GermanyDepartment of Informatics, University ofHamburg, GermanyIn this paper, we present a new intrinsically motivated actor-critic algorithm for learning continuous motor skills directly from raw visual input. Our neural architecture is composed of a critic and an actor network. Both networks receive the hidden representation of a deep convolutional autoencoder which is trained to reconstruct the visual input, while the centre-most hidden representation is also optimized to estimate the state value. Separately, an ensemble of predictive world models generates, based on its learning progress, an intrinsic reward signal which is combined with the extrinsic reward to guide the exploration of the actor-critic learner. Our approach is more data-efficient and inherently more stable than the existing actor-critic methods for continuous control from pixel data. We evaluate our algorithm for the task of learning robotic reaching and grasping skills on a realistic physics simulator and on a humanoid robot. The results show that the control policies learned with our approach can achieve better performance than the compared state-of-the-art and baseline algorithms in both dense-reward and challenging sparse-reward settings.https://doi.org/10.1515/pjbr-2019-0005deep reinforcement learningactor-criticcontinuous controlefficient explorationneuro-robotics |
spellingShingle | Hafez Muhammad Burhan Weber Cornelius Kerzel Matthias Wermter Stefan Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning Paladyn deep reinforcement learning actor-critic continuous control efficient exploration neuro-robotics |
title | Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning |
title_full | Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning |
title_fullStr | Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning |
title_full_unstemmed | Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning |
title_short | Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning |
title_sort | deep intrinsically motivated continuous actor critic for efficient robotic visuomotor skill learning |
topic | deep reinforcement learning actor-critic continuous control efficient exploration neuro-robotics |
url | https://doi.org/10.1515/pjbr-2019-0005 |
work_keys_str_mv | AT hafezmuhammadburhan deepintrinsicallymotivatedcontinuousactorcriticforefficientroboticvisuomotorskilllearning AT webercornelius deepintrinsicallymotivatedcontinuousactorcriticforefficientroboticvisuomotorskilllearning AT kerzelmatthias deepintrinsicallymotivatedcontinuousactorcriticforefficientroboticvisuomotorskilllearning AT wermterstefan deepintrinsicallymotivatedcontinuousactorcriticforefficientroboticvisuomotorskilllearning |