Sim-to-real via latent prediction: Transferring visual non-prehensile manipulation policies

Reinforcement Learning has been shown to have a great potential for robotics. It demonstrated the capability to solve complex manipulation and locomotion tasks, even by learning end-to-end policies that operate directly on visual input, removing the need for custom perception systems. However, for p...

Full description

Bibliographic Details
Main Authors:	Carlo Rizzardo, Fei Chen, Darwin Caldwell
Format:	Article
Language:	English
Published:	Frontiers Media S.A. 2023-01-01
Series:	Frontiers in Robotics and AI
Subjects:	reinforcement learning (RL) robotics manipulation variational techniques dynamics pushing
Online Access:	https://www.frontiersin.org/articles/10.3389/frobt.2022.1067502/full

_version_	1828066526991745024
author	Carlo Rizzardo Fei Chen Fei Chen Darwin Caldwell
author_facet	Carlo Rizzardo Fei Chen Fei Chen Darwin Caldwell
author_sort	Carlo Rizzardo
collection	DOAJ
description	Reinforcement Learning has been shown to have a great potential for robotics. It demonstrated the capability to solve complex manipulation and locomotion tasks, even by learning end-to-end policies that operate directly on visual input, removing the need for custom perception systems. However, for practical robotics applications, its scarce sample efficiency, the need for huge amounts of resources, data, and computation time can be an insurmountable obstacle. One potential solution to this sample efficiency issue is the use of simulated environments. However, the discrepancy in visual and physical characteristics between reality and simulation, namely the sim-to-real gap, often significantly reduces the real-world performance of policies trained within a simulator. In this work we propose a sim-to-real technique that trains a Soft-Actor Critic agent together with a decoupled feature extractor and a latent-space dynamics model. The decoupled nature of the method allows to independently perform the sim-to-real transfer of feature extractor and control policy, and the presence of the dynamics model acts as a constraint on the latent representation when finetuning the feature extractor on real-world data. We show how this architecture can allow the transfer of a trained agent from simulation to reality without retraining or finetuning the control policy, but using real-world data only for adapting the feature extractor. By avoiding training the control policy in the real domain we overcome the need to apply Reinforcement Learning on real-world data, instead, we only focus on the unsupervised training of the feature extractor, considerably reducing real-world experience collection requirements. We evaluate the method on sim-to-sim and sim-to-real transfer of a policy for table-top robotic object pushing. We demonstrate how the method is capable of adapting to considerable variations in the task observations, such as changes in point-of-view, colors, and lighting, all while substantially reducing the training time with respect to policies trained directly in the real.
first_indexed	2024-04-10T23:32:01Z
format	Article
id	doaj.art-9f50428d665d46929f6a26d52ab8ee23
institution	Directory Open Access Journal
issn	2296-9144
language	English
last_indexed	2024-04-10T23:32:01Z
publishDate	2023-01-01
publisher	Frontiers Media S.A.
record_format	Article
series	Frontiers in Robotics and AI
spelling	doaj.art-9f50428d665d46929f6a26d52ab8ee232023-01-12T05:32:21ZengFrontiers Media S.A.Frontiers in Robotics and AI2296-91442023-01-01910.3389/frobt.2022.10675021067502Sim-to-real via latent prediction: Transferring visual non-prehensile manipulation policiesCarlo Rizzardo0Fei Chen1Fei Chen2Darwin Caldwell3Active Perception and Robot Interactive Learning Laboratory, Advanced Robotics, Istituto Italiano di Tecnologia, Genova, ItalyActive Perception and Robot Interactive Learning Laboratory, Advanced Robotics, Istituto Italiano di Tecnologia, Genova, ItalyDepartment of Mechanical and Automation Engineering, T-Stone Robotics Institute, The Chinese University of Hong Kong, Hong Kong, ChinaActive Perception and Robot Interactive Learning Laboratory, Advanced Robotics, Istituto Italiano di Tecnologia, Genova, ItalyReinforcement Learning has been shown to have a great potential for robotics. It demonstrated the capability to solve complex manipulation and locomotion tasks, even by learning end-to-end policies that operate directly on visual input, removing the need for custom perception systems. However, for practical robotics applications, its scarce sample efficiency, the need for huge amounts of resources, data, and computation time can be an insurmountable obstacle. One potential solution to this sample efficiency issue is the use of simulated environments. However, the discrepancy in visual and physical characteristics between reality and simulation, namely the sim-to-real gap, often significantly reduces the real-world performance of policies trained within a simulator. In this work we propose a sim-to-real technique that trains a Soft-Actor Critic agent together with a decoupled feature extractor and a latent-space dynamics model. The decoupled nature of the method allows to independently perform the sim-to-real transfer of feature extractor and control policy, and the presence of the dynamics model acts as a constraint on the latent representation when finetuning the feature extractor on real-world data. We show how this architecture can allow the transfer of a trained agent from simulation to reality without retraining or finetuning the control policy, but using real-world data only for adapting the feature extractor. By avoiding training the control policy in the real domain we overcome the need to apply Reinforcement Learning on real-world data, instead, we only focus on the unsupervised training of the feature extractor, considerably reducing real-world experience collection requirements. We evaluate the method on sim-to-sim and sim-to-real transfer of a policy for table-top robotic object pushing. We demonstrate how the method is capable of adapting to considerable variations in the task observations, such as changes in point-of-view, colors, and lighting, all while substantially reducing the training time with respect to policies trained directly in the real.https://www.frontiersin.org/articles/10.3389/frobt.2022.1067502/fullreinforcement learning (RL)roboticsmanipulationvariational techniquesdynamicspushing
spellingShingle	Carlo Rizzardo Fei Chen Fei Chen Darwin Caldwell Sim-to-real via latent prediction: Transferring visual non-prehensile manipulation policies Frontiers in Robotics and AI reinforcement learning (RL) robotics manipulation variational techniques dynamics pushing
title	Sim-to-real via latent prediction: Transferring visual non-prehensile manipulation policies
title_full	Sim-to-real via latent prediction: Transferring visual non-prehensile manipulation policies
title_fullStr	Sim-to-real via latent prediction: Transferring visual non-prehensile manipulation policies
title_full_unstemmed	Sim-to-real via latent prediction: Transferring visual non-prehensile manipulation policies
title_short	Sim-to-real via latent prediction: Transferring visual non-prehensile manipulation policies
title_sort	sim to real via latent prediction transferring visual non prehensile manipulation policies
topic	reinforcement learning (RL) robotics manipulation variational techniques dynamics pushing
url	https://www.frontiersin.org/articles/10.3389/frobt.2022.1067502/full
work_keys_str_mv	AT carlorizzardo simtorealvialatentpredictiontransferringvisualnonprehensilemanipulationpolicies AT feichen simtorealvialatentpredictiontransferringvisualnonprehensilemanipulationpolicies AT feichen simtorealvialatentpredictiontransferringvisualnonprehensilemanipulationpolicies AT darwincaldwell simtorealvialatentpredictiontransferringvisualnonprehensilemanipulationpolicies

Sim-to-real via latent prediction: Transferring visual non-prehensile manipulation policies

Similar Items