Expected policy gradients

We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates across the action when estimating the gradient, instead of relying only on the action in the sampl...

Disgrifiad llawn

Manylion Llyfryddiaeth
Prif Awduron: Ciosek, K, Whiteson, S
Fformat: Conference item
Cyhoeddwyd: Association for the Advancement of Artificial Intelligence 2018