Fourier policy gradients

We propose a new way of deriving policy gradient updates for reinforcement learning. Our technique, based on Fourier analysis, recasts integrals that arise with expected policy gradients as convolutions and turns them into multiplications. The obtained analytical solutions allow us to capture the lo...

Full description

Bibliographic Details
Main Authors:	Fellows, M, Ciosek, K, Whiteson, S
Format:	Conference item
Published:	Journal of Machine Learning Research 2018

_version_	1826302908487958528
author	Fellows, M Ciosek, K Whiteson, S
author_facet	Fellows, M Ciosek, K Whiteson, S
author_sort	Fellows, M
collection	OXFORD
description	We propose a new way of deriving policy gradient updates for reinforcement learning. Our technique, based on Fourier analysis, recasts integrals that arise with expected policy gradients as convolutions and turns them into multiplications. The obtained analytical solutions allow us to capture the low variance benefits of EPG in a broad range of settings. For the critic, we treat trigonometric and radial basis functions, two function families with the universal approximation property. The choice of policy can be almost arbitrary, including mixtures or hybrid continuous-discrete probability distributions. Moreover, we derive a general family of sample-based estimators for stochastic policy gradients, which unifies existing results on sample-based approximation. We believe that this technique has the potential to shape the next generation of policy gradient approaches, powered by analytical results.
first_indexed	2024-03-07T05:54:35Z
format	Conference item
id	oxford-uuid:ea16c478-a846-4751-a22d-7f9ba165071f
institution	University of Oxford
last_indexed	2024-03-07T05:54:35Z
publishDate	2018
publisher	Journal of Machine Learning Research
record_format	dspace
spelling	oxford-uuid:ea16c478-a846-4751-a22d-7f9ba165071f2022-03-27T10:59:04ZFourier policy gradientsConference itemhttp://purl.org/coar/resource_type/c_5794uuid:ea16c478-a846-4751-a22d-7f9ba165071fSymplectic Elements at OxfordJournal of Machine Learning Research2018Fellows, MCiosek, KWhiteson, SWe propose a new way of deriving policy gradient updates for reinforcement learning. Our technique, based on Fourier analysis, recasts integrals that arise with expected policy gradients as convolutions and turns them into multiplications. The obtained analytical solutions allow us to capture the low variance benefits of EPG in a broad range of settings. For the critic, we treat trigonometric and radial basis functions, two function families with the universal approximation property. The choice of policy can be almost arbitrary, including mixtures or hybrid continuous-discrete probability distributions. Moreover, we derive a general family of sample-based estimators for stochastic policy gradients, which unifies existing results on sample-based approximation. We believe that this technique has the potential to shape the next generation of policy gradient approaches, powered by analytical results.
spellingShingle	Fellows, M Ciosek, K Whiteson, S Fourier policy gradients
title	Fourier policy gradients
title_full	Fourier policy gradients
title_fullStr	Fourier policy gradients
title_full_unstemmed	Fourier policy gradients
title_short	Fourier policy gradients
title_sort	fourier policy gradients
work_keys_str_mv	AT fellowsm fourierpolicygradients AT ciosekk fourierpolicygradients AT whitesons fourierpolicygradients

Fourier policy gradients

Similar Items