VariBAD: a very good method for Bayes-adaptive deep RL via meta-learning
Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning. A Bayes-optimal policy, which does so optimally, conditions its actions not only on the environment state but on the agent’s uncertainty about the environment. Computing a Bayes-o...
Main Authors: | , , , , , , |
---|---|
格式: | Conference item |
语言: | English |
出版: |
International Conference on Learning Representations
2020
|
_version_ | 1826274057001107456 |
---|---|
author | Zintgraf, L Shiarlis, K Igl, M Schulze, S Gal, Y Hofmann, K Whiteson, S |
author_facet | Zintgraf, L Shiarlis, K Igl, M Schulze, S Gal, Y Hofmann, K Whiteson, S |
author_sort | Zintgraf, L |
collection | OXFORD |
description | Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning. A Bayes-optimal policy, which does so optimally, conditions its actions not only on the environment state but on the agent’s uncertainty about the environment. Computing a Bayes-optimal policy is however intractable for all but the smallest tasks. In this paper, we introduce variational Bayes-Adaptive Deep RL (variBAD), a way to meta-learn to perform approximate inference in an unknown environment, and incorporate task uncer- tainty directly during action selection. In a grid-world domain, we illustrate how variBAD performs structured online exploration as a function of task uncertainty. We further evaluate variBAD on MuJoCo domains widely used in meta-RL and show that it achieves higher online return than existing methods. |
first_indexed | 2024-03-06T22:37:39Z |
format | Conference item |
id | oxford-uuid:5a769c20-c56b-43f9-8a6f-2c424a9f133c |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-06T22:37:39Z |
publishDate | 2020 |
publisher | International Conference on Learning Representations |
record_format | dspace |
spelling | oxford-uuid:5a769c20-c56b-43f9-8a6f-2c424a9f133c2022-03-26T17:15:59ZVariBAD: a very good method for Bayes-adaptive deep RL via meta-learningConference itemhttp://purl.org/coar/resource_type/c_5794uuid:5a769c20-c56b-43f9-8a6f-2c424a9f133cEnglishSymplectic ElementsInternational Conference on Learning Representations2020Zintgraf, LShiarlis, KIgl, MSchulze, SGal, YHofmann, KWhiteson, STrading off exploration and exploitation in an unknown environment is key to maximising expected return during learning. A Bayes-optimal policy, which does so optimally, conditions its actions not only on the environment state but on the agent’s uncertainty about the environment. Computing a Bayes-optimal policy is however intractable for all but the smallest tasks. In this paper, we introduce variational Bayes-Adaptive Deep RL (variBAD), a way to meta-learn to perform approximate inference in an unknown environment, and incorporate task uncer- tainty directly during action selection. In a grid-world domain, we illustrate how variBAD performs structured online exploration as a function of task uncertainty. We further evaluate variBAD on MuJoCo domains widely used in meta-RL and show that it achieves higher online return than existing methods. |
spellingShingle | Zintgraf, L Shiarlis, K Igl, M Schulze, S Gal, Y Hofmann, K Whiteson, S VariBAD: a very good method for Bayes-adaptive deep RL via meta-learning |
title | VariBAD: a very good method for Bayes-adaptive deep RL via meta-learning |
title_full | VariBAD: a very good method for Bayes-adaptive deep RL via meta-learning |
title_fullStr | VariBAD: a very good method for Bayes-adaptive deep RL via meta-learning |
title_full_unstemmed | VariBAD: a very good method for Bayes-adaptive deep RL via meta-learning |
title_short | VariBAD: a very good method for Bayes-adaptive deep RL via meta-learning |
title_sort | varibad a very good method for bayes adaptive deep rl via meta learning |
work_keys_str_mv | AT zintgrafl varibadaverygoodmethodforbayesadaptivedeeprlviametalearning AT shiarlisk varibadaverygoodmethodforbayesadaptivedeeprlviametalearning AT iglm varibadaverygoodmethodforbayesadaptivedeeprlviametalearning AT schulzes varibadaverygoodmethodforbayesadaptivedeeprlviametalearning AT galy varibadaverygoodmethodforbayesadaptivedeeprlviametalearning AT hofmannk varibadaverygoodmethodforbayesadaptivedeeprlviametalearning AT whitesons varibadaverygoodmethodforbayesadaptivedeeprlviametalearning |