Fourier policy gradients

We propose a new way of deriving policy gradient updates for reinforcement learning. Our technique, based on Fourier analysis, recasts integrals that arise with expected policy gradients as convolutions and turns them into multiplications. The obtained analytical solutions allow us to capture the lo...

Full description

Bibliographic Details
Main Authors: Fellows, M, Ciosek, K, Whiteson, S
Format: Conference item
Published: Journal of Machine Learning Research 2018
_version_ 1826302908487958528
author Fellows, M
Ciosek, K
Whiteson, S
author_facet Fellows, M
Ciosek, K
Whiteson, S
author_sort Fellows, M
collection OXFORD
description We propose a new way of deriving policy gradient updates for reinforcement learning. Our technique, based on Fourier analysis, recasts integrals that arise with expected policy gradients as convolutions and turns them into multiplications. The obtained analytical solutions allow us to capture the low variance benefits of EPG in a broad range of settings. For the critic, we treat trigonometric and radial basis functions, two function families with the universal approximation property. The choice of policy can be almost arbitrary, including mixtures or hybrid continuous-discrete probability distributions. Moreover, we derive a general family of sample-based estimators for stochastic policy gradients, which unifies existing results on sample-based approximation. We believe that this technique has the potential to shape the next generation of policy gradient approaches, powered by analytical results.
first_indexed 2024-03-07T05:54:35Z
format Conference item
id oxford-uuid:ea16c478-a846-4751-a22d-7f9ba165071f
institution University of Oxford
last_indexed 2024-03-07T05:54:35Z
publishDate 2018
publisher Journal of Machine Learning Research
record_format dspace
spelling oxford-uuid:ea16c478-a846-4751-a22d-7f9ba165071f2022-03-27T10:59:04ZFourier policy gradientsConference itemhttp://purl.org/coar/resource_type/c_5794uuid:ea16c478-a846-4751-a22d-7f9ba165071fSymplectic Elements at OxfordJournal of Machine Learning Research2018Fellows, MCiosek, KWhiteson, SWe propose a new way of deriving policy gradient updates for reinforcement learning. Our technique, based on Fourier analysis, recasts integrals that arise with expected policy gradients as convolutions and turns them into multiplications. The obtained analytical solutions allow us to capture the low variance benefits of EPG in a broad range of settings. For the critic, we treat trigonometric and radial basis functions, two function families with the universal approximation property. The choice of policy can be almost arbitrary, including mixtures or hybrid continuous-discrete probability distributions. Moreover, we derive a general family of sample-based estimators for stochastic policy gradients, which unifies existing results on sample-based approximation. We believe that this technique has the potential to shape the next generation of policy gradient approaches, powered by analytical results.
spellingShingle Fellows, M
Ciosek, K
Whiteson, S
Fourier policy gradients
title Fourier policy gradients
title_full Fourier policy gradients
title_fullStr Fourier policy gradients
title_full_unstemmed Fourier policy gradients
title_short Fourier policy gradients
title_sort fourier policy gradients
work_keys_str_mv AT fellowsm fourierpolicygradients
AT ciosekk fourierpolicygradients
AT whitesons fourierpolicygradients