Bayesian Bellman operators

We introduce a novel perspective on Bayesian reinforcement learning (RL); whereas existing approaches infer a posterior over the transition distribution or Q-function, we characterise the uncertainty in the Bellman operator. Our Bayesian Bellman operator (BBO) framework is motivated by the insight t...

Full description

Bibliographic Details
Main Authors:	Fellows, M, Hartikainen, K, Whiteson, S
Format:	Conference item
Language:	English
Published:	NeurIPS 2022

_version_	1826307465031974912
author	Fellows, M Hartikainen, K Whiteson, S
author_facet	Fellows, M Hartikainen, K Whiteson, S
author_sort	Fellows, M
collection	OXFORD
description	We introduce a novel perspective on Bayesian reinforcement learning (RL); whereas existing approaches infer a posterior over the transition distribution or Q-function, we characterise the uncertainty in the Bellman operator. Our Bayesian Bellman operator (BBO) framework is motivated by the insight that when bootstrapping is introduced, model-free approaches actually infer a posterior over Bellman operators, not value functions. In this paper, we use BBO to provide a rigorous theoretical analysis of model-free Bayesian RL to better understand its relationship to established frequentist RL methodologies. We prove that Bayesian solutions are consistent with frequentist RL solutions, even when approximate inference is used, and derive conditions for which convergence properties hold. Empirically, we demonstrate that algorithms derived from the BBO framework have sophisticated deep exploration properties that enable them to solve continuous control tasks at which state-of-the-art regularised actor-critic algorithms fail catastrophically.
first_indexed	2024-03-07T07:03:31Z
format	Conference item
id	oxford-uuid:9e50316f-7e83-4236-9e49-4546bbe03871
institution	University of Oxford
language	English
last_indexed	2024-03-07T07:03:31Z
publishDate	2022
publisher	NeurIPS
record_format	dspace
spelling	oxford-uuid:9e50316f-7e83-4236-9e49-4546bbe038712022-04-05T07:29:04ZBayesian Bellman operatorsConference itemhttp://purl.org/coar/resource_type/c_5794uuid:9e50316f-7e83-4236-9e49-4546bbe03871EnglishSymplectic ElementsNeurIPS2022Fellows, MHartikainen, KWhiteson, SWe introduce a novel perspective on Bayesian reinforcement learning (RL); whereas existing approaches infer a posterior over the transition distribution or Q-function, we characterise the uncertainty in the Bellman operator. Our Bayesian Bellman operator (BBO) framework is motivated by the insight that when bootstrapping is introduced, model-free approaches actually infer a posterior over Bellman operators, not value functions. In this paper, we use BBO to provide a rigorous theoretical analysis of model-free Bayesian RL to better understand its relationship to established frequentist RL methodologies. We prove that Bayesian solutions are consistent with frequentist RL solutions, even when approximate inference is used, and derive conditions for which convergence properties hold. Empirically, we demonstrate that algorithms derived from the BBO framework have sophisticated deep exploration properties that enable them to solve continuous control tasks at which state-of-the-art regularised actor-critic algorithms fail catastrophically.
spellingShingle	Fellows, M Hartikainen, K Whiteson, S Bayesian Bellman operators
title	Bayesian Bellman operators
title_full	Bayesian Bellman operators
title_fullStr	Bayesian Bellman operators
title_full_unstemmed	Bayesian Bellman operators
title_short	Bayesian Bellman operators
title_sort	bayesian bellman operators
work_keys_str_mv	AT fellowsm bayesianbellmanoperators AT hartikainenk bayesianbellmanoperators AT whitesons bayesianbellmanoperators

Bayesian Bellman operators

Similar Items