Bayesian Bellman operators

We introduce a novel perspective on Bayesian reinforcement learning (RL); whereas existing approaches infer a posterior over the transition distribution or Q-function, we characterise the uncertainty in the Bellman operator. Our Bayesian Bellman operator (BBO) framework is motivated by the insight t...

Full description

Bibliographic Details
Main Authors: Fellows, M, Hartikainen, K, Whiteson, S
Format: Conference item
Language:English
Published: NeurIPS 2022
_version_ 1797106510564687872
author Fellows, M
Hartikainen, K
Whiteson, S
author_facet Fellows, M
Hartikainen, K
Whiteson, S
author_sort Fellows, M
collection OXFORD
description We introduce a novel perspective on Bayesian reinforcement learning (RL); whereas existing approaches infer a posterior over the transition distribution or Q-function, we characterise the uncertainty in the Bellman operator. Our Bayesian Bellman operator (BBO) framework is motivated by the insight that when bootstrapping is introduced, model-free approaches actually infer a posterior over Bellman operators, not value functions. In this paper, we use BBO to provide a rigorous theoretical analysis of model-free Bayesian RL to better understand its relationship to established frequentist RL methodologies. We prove that Bayesian solutions are consistent with frequentist RL solutions, even when approximate inference is used, and derive conditions for which convergence properties hold. Empirically, we demonstrate that algorithms derived from the BBO framework have sophisticated deep exploration properties that enable them to solve continuous control tasks at which state-of-the-art regularised actor-critic algorithms fail catastrophically.
first_indexed 2024-03-07T07:03:31Z
format Conference item
id oxford-uuid:9e50316f-7e83-4236-9e49-4546bbe03871
institution University of Oxford
language English
last_indexed 2024-03-07T07:03:31Z
publishDate 2022
publisher NeurIPS
record_format dspace
spelling oxford-uuid:9e50316f-7e83-4236-9e49-4546bbe038712022-04-05T07:29:04ZBayesian Bellman operatorsConference itemhttp://purl.org/coar/resource_type/c_5794uuid:9e50316f-7e83-4236-9e49-4546bbe03871EnglishSymplectic ElementsNeurIPS2022Fellows, MHartikainen, KWhiteson, SWe introduce a novel perspective on Bayesian reinforcement learning (RL); whereas existing approaches infer a posterior over the transition distribution or Q-function, we characterise the uncertainty in the Bellman operator. Our Bayesian Bellman operator (BBO) framework is motivated by the insight that when bootstrapping is introduced, model-free approaches actually infer a posterior over Bellman operators, not value functions. In this paper, we use BBO to provide a rigorous theoretical analysis of model-free Bayesian RL to better understand its relationship to established frequentist RL methodologies. We prove that Bayesian solutions are consistent with frequentist RL solutions, even when approximate inference is used, and derive conditions for which convergence properties hold. Empirically, we demonstrate that algorithms derived from the BBO framework have sophisticated deep exploration properties that enable them to solve continuous control tasks at which state-of-the-art regularised actor-critic algorithms fail catastrophically.
spellingShingle Fellows, M
Hartikainen, K
Whiteson, S
Bayesian Bellman operators
title Bayesian Bellman operators
title_full Bayesian Bellman operators
title_fullStr Bayesian Bellman operators
title_full_unstemmed Bayesian Bellman operators
title_short Bayesian Bellman operators
title_sort bayesian bellman operators
work_keys_str_mv AT fellowsm bayesianbellmanoperators
AT hartikainenk bayesianbellmanoperators
AT whitesons bayesianbellmanoperators