Distral: robust multitask reinforcement learning

Neural information processing systems foundation. All rights reserved. Most deep reinforcement learning algorithms are data inefficient in complex and rich environments, limiting their applicability to many scenarios. One direction for improving data efficiency is multitask learning with shared neur...

Full description

Bibliographic Details
Main Authors:	Teh, YW, Bapst, V, Czarnecki, WM, Quan, J, Kirkpatrick, J, Hadsell, R, Heess, N, Pascanu, R
Format:	Conference item
Published:	Massachusetts Institute of Technology Press 2017

_version_	1797053263951953920
author	Teh, YW Bapst, V Czarnecki, WM Quan, J Kirkpatrick, J Hadsell, R Heess, N Pascanu, R
author_facet	Teh, YW Bapst, V Czarnecki, WM Quan, J Kirkpatrick, J Hadsell, R Heess, N Pascanu, R
author_sort	Teh, YW
collection	OXFORD
description	Neural information processing systems foundation. All rights reserved. Most deep reinforcement learning algorithms are data inefficient in complex and rich environments, limiting their applicability to many scenarios. One direction for improving data efficiency is multitask learning with shared neural network parameters, where efficiency may be improved through transfer across related tasks. In practice, however, this is not usually observed, because gradients from different tasks can interfere negatively, making learning unstable and sometimes even less data efficient. Another issue is the different reward schemes between tasks, which can easily lead to one task dominating the learning of a shared model. We propose a new approach for joint training of multiple tasks, which we refer to as Distral (distill & transfer learning). Instead of sharing parameters between the different workers, we propose to share a "distilled" policy that captures common behaviour across tasks. Each worker is trained to solve its own task while constrained to stay close to the shared policy, while the shared policy is trained by distillation to be the centroid of all task policies. Both aspects of the learning process are derived by optimizing a joint objective function. We show that our approach supports efficient transfer on complex 3D environments, outperforming several related methods. Moreover, the proposed learning process is more robust to hyperparameter settings and more stable - attributes that are critical in deep reinforcement learning.
first_indexed	2024-03-06T18:41:27Z
format	Conference item
id	oxford-uuid:0cfdde8d-8b0b-440a-97b7-7d2a185d1ad6
institution	University of Oxford
last_indexed	2024-03-06T18:41:27Z
publishDate	2017
publisher	Massachusetts Institute of Technology Press
record_format	dspace
spelling	oxford-uuid:0cfdde8d-8b0b-440a-97b7-7d2a185d1ad62022-03-26T09:38:11ZDistral: robust multitask reinforcement learningConference itemhttp://purl.org/coar/resource_type/c_5794uuid:0cfdde8d-8b0b-440a-97b7-7d2a185d1ad6Symplectic ElementsMassachusetts Institute of Technology Press2017Teh, YWBapst, VCzarnecki, WMQuan, JKirkpatrick, JHadsell, RHeess, NPascanu, RNeural information processing systems foundation. All rights reserved. Most deep reinforcement learning algorithms are data inefficient in complex and rich environments, limiting their applicability to many scenarios. One direction for improving data efficiency is multitask learning with shared neural network parameters, where efficiency may be improved through transfer across related tasks. In practice, however, this is not usually observed, because gradients from different tasks can interfere negatively, making learning unstable and sometimes even less data efficient. Another issue is the different reward schemes between tasks, which can easily lead to one task dominating the learning of a shared model. We propose a new approach for joint training of multiple tasks, which we refer to as Distral (distill & transfer learning). Instead of sharing parameters between the different workers, we propose to share a "distilled" policy that captures common behaviour across tasks. Each worker is trained to solve its own task while constrained to stay close to the shared policy, while the shared policy is trained by distillation to be the centroid of all task policies. Both aspects of the learning process are derived by optimizing a joint objective function. We show that our approach supports efficient transfer on complex 3D environments, outperforming several related methods. Moreover, the proposed learning process is more robust to hyperparameter settings and more stable - attributes that are critical in deep reinforcement learning.
spellingShingle	Teh, YW Bapst, V Czarnecki, WM Quan, J Kirkpatrick, J Hadsell, R Heess, N Pascanu, R Distral: robust multitask reinforcement learning
title	Distral: robust multitask reinforcement learning
title_full	Distral: robust multitask reinforcement learning
title_fullStr	Distral: robust multitask reinforcement learning
title_full_unstemmed	Distral: robust multitask reinforcement learning
title_short	Distral: robust multitask reinforcement learning
title_sort	distral robust multitask reinforcement learning
work_keys_str_mv	AT tehyw distralrobustmultitaskreinforcementlearning AT bapstv distralrobustmultitaskreinforcementlearning AT czarneckiwm distralrobustmultitaskreinforcementlearning AT quanj distralrobustmultitaskreinforcementlearning AT kirkpatrickj distralrobustmultitaskreinforcementlearning AT hadsellr distralrobustmultitaskreinforcementlearning AT heessn distralrobustmultitaskreinforcementlearning AT pascanur distralrobustmultitaskreinforcementlearning

Distral: robust multitask reinforcement learning

Similar Items