Expected policy gradients for reinforcement learning
We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates (or sums) across actions when estimating the gradient, instead of relying only on the action in th...
Główni autorzy: | Ciosek, K, Whiteson, S |
---|---|
Format: | Journal article |
Język: | English |
Wydane: |
Journal of Machine Learning Research
2020
|
Podobne zapisy
-
Expected policy gradients
od: Ciosek, K, i wsp.
Wydane: (2018) -
Fourier policy gradients
od: Fellows, M, i wsp.
Wydane: (2018) -
OFFER: Off-environment reinforcement learning
od: Ciosek, K, i wsp.
Wydane: (2017) -
Robust reinforcement learning with Bayesian optimisation and quadrature
od: Paul, S, i wsp.
Wydane: (2020) -
Alternating optimisation and quadrature for robust control
od: Paul, S, i wsp.
Wydane: (2018)