Deep residual reinforcement learning

<p>We revisit residual algorithms in both model-free and model-based reinforcement learning settings. We propose the bidirectional target network technique to stabilize residual algorithms, yielding a residual version of DDPG that significantly outperforms vanilla DDPG in the DeepMind Control...

Full description

Bibliographic Details
Main Authors: Zhang, S, Boehmer, W, Whiteson, S
Format: Conference item
Language:English
Published: International Foundation for Autonomous Agents and Multiagent Systems 2020
_version_ 1797059010516484096
author Zhang, S
Boehmer, W
Whiteson, S
author_facet Zhang, S
Boehmer, W
Whiteson, S
author_sort Zhang, S
collection OXFORD
description <p>We revisit residual algorithms in both model-free and model-based reinforcement learning settings. We propose the bidirectional target network technique to stabilize residual algorithms, yielding a residual version of DDPG that significantly outperforms vanilla DDPG in the DeepMind Control Suite benchmark. Moreover, we find the residual algorithm an effective approach to the distribution mismatch problem in model-based planning. Compared with the existing TD(k) method, our residual-based method makes weaker assumptions about the model and yields a greater performance boost.</p>
first_indexed 2024-03-06T19:58:11Z
format Conference item
id oxford-uuid:265eddeb-b77a-4c8b-99c9-f927a69f7928
institution University of Oxford
language English
last_indexed 2024-03-06T19:58:11Z
publishDate 2020
publisher International Foundation for Autonomous Agents and Multiagent Systems
record_format dspace
spelling oxford-uuid:265eddeb-b77a-4c8b-99c9-f927a69f79282022-03-26T12:00:36ZDeep residual reinforcement learningConference itemhttp://purl.org/coar/resource_type/c_5794uuid:265eddeb-b77a-4c8b-99c9-f927a69f7928EnglishSymplectic ElementsInternational Foundation for Autonomous Agents and Multiagent Systems2020Zhang, SBoehmer, WWhiteson, S<p>We revisit residual algorithms in both model-free and model-based reinforcement learning settings. We propose the bidirectional target network technique to stabilize residual algorithms, yielding a residual version of DDPG that significantly outperforms vanilla DDPG in the DeepMind Control Suite benchmark. Moreover, we find the residual algorithm an effective approach to the distribution mismatch problem in model-based planning. Compared with the existing TD(k) method, our residual-based method makes weaker assumptions about the model and yields a greater performance boost.</p>
spellingShingle Zhang, S
Boehmer, W
Whiteson, S
Deep residual reinforcement learning
title Deep residual reinforcement learning
title_full Deep residual reinforcement learning
title_fullStr Deep residual reinforcement learning
title_full_unstemmed Deep residual reinforcement learning
title_short Deep residual reinforcement learning
title_sort deep residual reinforcement learning
work_keys_str_mv AT zhangs deepresidualreinforcementlearning
AT boehmerw deepresidualreinforcementlearning
AT whitesons deepresidualreinforcementlearning