VariBAD: a very good method for Bayes-adaptive deep RL via meta-learning

Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning. A Bayes-optimal policy, which does so optimally, conditions its actions not only on the environment state but on the agent’s uncertainty about the environment. Computing a Bayes-o...

Täydet tiedot

Bibliografiset tiedot
Päätekijät:	Zintgraf, L, Shiarlis, K, Igl, M, Schulze, S, Gal, Y, Hofmann, K, Whiteson, S
Aineistotyyppi:	Conference item
Kieli:	English
Julkaistu:	International Conference on Learning Representations 2020

Kuvaus
Yhteenveto:	Trading off exploration and exploitation in an unknown environment is key to maximising expected return during learning. A Bayes-optimal policy, which does so optimally, conditions its actions not only on the environment state but on the agent’s uncertainty about the environment. Computing a Bayes-optimal policy is however intractable for all but the smallest tasks. In this paper, we introduce variational Bayes-Adaptive Deep RL (variBAD), a way to meta-learn to perform approximate inference in an unknown environment, and incorporate task uncer- tainty directly during action selection. In a grid-world domain, we illustrate how variBAD performs structured online exploration as a function of task uncertainty. We further evaluate variBAD on MuJoCo domains widely used in meta-RL and show that it achieves higher online return than existing methods.

VariBAD: a very good method for Bayes-adaptive deep RL via meta-learning

Samankaltaisia teoksia