VariBAD: a very good method for Bayes-adaptive deep RL via meta-learning

Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning. A Bayes-optimal policy, which does so optimally, conditions its actions not only on the environment state but on the agent’s uncertainty about the environment. Computing a Bayes-o...

全面介绍

书目详细资料
Main Authors: Zintgraf, L, Shiarlis, K, Igl, M, Schulze, S, Gal, Y, Hofmann, K, Whiteson, S
格式: Conference item
语言:English
出版: International Conference on Learning Representations 2020
_version_ 1826274057001107456
author Zintgraf, L
Shiarlis, K
Igl, M
Schulze, S
Gal, Y
Hofmann, K
Whiteson, S
author_facet Zintgraf, L
Shiarlis, K
Igl, M
Schulze, S
Gal, Y
Hofmann, K
Whiteson, S
author_sort Zintgraf, L
collection OXFORD
description Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning. A Bayes-optimal policy, which does so optimally, conditions its actions not only on the environment state but on the agent’s uncertainty about the environment. Computing a Bayes-optimal policy is however intractable for all but the smallest tasks. In this paper, we introduce variational Bayes-Adaptive Deep RL (variBAD), a way to meta-learn to perform approximate inference in an unknown environment, and incorporate task uncer- tainty directly during action selection. In a grid-world domain, we illustrate how variBAD performs structured online exploration as a function of task uncertainty. We further evaluate variBAD on MuJoCo domains widely used in meta-RL and show that it achieves higher online return than existing methods.
first_indexed 2024-03-06T22:37:39Z
format Conference item
id oxford-uuid:5a769c20-c56b-43f9-8a6f-2c424a9f133c
institution University of Oxford
language English
last_indexed 2024-03-06T22:37:39Z
publishDate 2020
publisher International Conference on Learning Representations
record_format dspace
spelling oxford-uuid:5a769c20-c56b-43f9-8a6f-2c424a9f133c2022-03-26T17:15:59ZVariBAD: a very good method for Bayes-adaptive deep RL via meta-learningConference itemhttp://purl.org/coar/resource_type/c_5794uuid:5a769c20-c56b-43f9-8a6f-2c424a9f133cEnglishSymplectic ElementsInternational Conference on Learning Representations2020Zintgraf, LShiarlis, KIgl, MSchulze, SGal, YHofmann, KWhiteson, STrading off exploration and exploitation in an unknown environment is key to maximising expected return during learning. A Bayes-optimal policy, which does so optimally, conditions its actions not only on the environment state but on the agent’s uncertainty about the environment. Computing a Bayes-optimal policy is however intractable for all but the smallest tasks. In this paper, we introduce variational Bayes-Adaptive Deep RL (variBAD), a way to meta-learn to perform approximate inference in an unknown environment, and incorporate task uncer- tainty directly during action selection. In a grid-world domain, we illustrate how variBAD performs structured online exploration as a function of task uncertainty. We further evaluate variBAD on MuJoCo domains widely used in meta-RL and show that it achieves higher online return than existing methods.
spellingShingle Zintgraf, L
Shiarlis, K
Igl, M
Schulze, S
Gal, Y
Hofmann, K
Whiteson, S
VariBAD: a very good method for Bayes-adaptive deep RL via meta-learning
title VariBAD: a very good method for Bayes-adaptive deep RL via meta-learning
title_full VariBAD: a very good method for Bayes-adaptive deep RL via meta-learning
title_fullStr VariBAD: a very good method for Bayes-adaptive deep RL via meta-learning
title_full_unstemmed VariBAD: a very good method for Bayes-adaptive deep RL via meta-learning
title_short VariBAD: a very good method for Bayes-adaptive deep RL via meta-learning
title_sort varibad a very good method for bayes adaptive deep rl via meta learning
work_keys_str_mv AT zintgrafl varibadaverygoodmethodforbayesadaptivedeeprlviametalearning
AT shiarlisk varibadaverygoodmethodforbayesadaptivedeeprlviametalearning
AT iglm varibadaverygoodmethodforbayesadaptivedeeprlviametalearning
AT schulzes varibadaverygoodmethodforbayesadaptivedeeprlviametalearning
AT galy varibadaverygoodmethodforbayesadaptivedeeprlviametalearning
AT hofmannk varibadaverygoodmethodforbayesadaptivedeeprlviametalearning
AT whitesons varibadaverygoodmethodforbayesadaptivedeeprlviametalearning