Expected policy gradients for reinforcement learning
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates (or sums) across actions when estimating the gradient, instead of relying only on the action in th...
Главные авторы: | Ciosek, K, Whiteson, S |
---|---|
Формат: | Journal article |
Язык: | English |
Опубликовано: |
Journal of Machine Learning Research
2020
|
Схожие документы
-
Expected policy gradients
по: Ciosek, K, и др.
Опубликовано: (2018) -
Fourier policy gradients
по: Fellows, M, и др.
Опубликовано: (2018) -
OFFER: Off-environment reinforcement learning
по: Ciosek, K, и др.
Опубликовано: (2017) -
Robust reinforcement learning with Bayesian optimisation and quadrature
по: Paul, S, и др.
Опубликовано: (2020) -
Alternating optimisation and quadrature for robust control
по: Paul, S, и др.
Опубликовано: (2018)