Expected policy gradients for reinforcement learning

Expected policy gradients for reinforcement learning

We propose expected policy gradients (EPG), which unify stochastic policy gradients (SPG) and deterministic policy gradients (DPG) for reinforcement learning. Inspired by expected sarsa, EPG integrates (or sums) across actions when estimating the gradient, instead of relying only on the action in th...

Szczegółowa specyfikacja

Opis bibliograficzny
Główni autorzy:	Ciosek, K, Whiteson, S
Format:	Journal article
Język:	English
Wydane:	Journal of Machine Learning Research 2020

Podobne zapisy

Expected policy gradients
od: Ciosek, K, i wsp.
Wydane: (2018)

Fourier policy gradients
od: Fellows, M, i wsp.
Wydane: (2018)

OFFER: Off-environment reinforcement learning
od: Ciosek, K, i wsp.
Wydane: (2017)

Robust reinforcement learning with Bayesian optimisation and quadrature
od: Paul, S, i wsp.
Wydane: (2020)

Alternating optimisation and quadrature for robust control
od: Paul, S, i wsp.
Wydane: (2018)

A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning
od: Kim, Dong-Ki, i wsp.
Wydane: (2022)

Deep reinforcement learning with robust deep deterministic policy gradient
od: Teckchai Tiong, i wsp.
Wydane: (2020)

Fingerprint policy optimisation for robust reinforcement learning
od: Paul, S, i wsp.
Wydane: (2019)

Counterfactual multi−agent policy gradients
od: Foerster, J, i wsp.
Wydane: (2018)

Fast efficient hyperparameter tuning for policy gradient methods
od: Paul, S, i wsp.
Wydane: (2019)

Mean−variance policy iteration for risk−averse reinforcement learning
od: Zhang, S, i wsp.
Wydane: (2021)

Exploration in Gradient-Based Reinforcement Learning
od: Meuleau, Nicolas, i wsp.
Wydane: (2004)

Loaded DiCE: Trading off bias and variance in any-order score function gradient estimators for reinforcement learning
od: Farquhar, G, i wsp.
Wydane: (2019)

Inverse reinforcement learning from failure
od: Shiarlis, K, i wsp.
Wydane: (2016)

Distributed Bayesian learning with stochastic natural gradient expectation propagation and the posterior server
od: Hasenclver, L, i wsp.
Wydane: (2017)

FACMAC: Factored multi−agent centralised policy gradients
od: Peng, B, i wsp.
Wydane: (2022)

Multileave gradient descent for fast online learning to rank
od: Whiteson, S, i wsp.
Wydane: (2016)

Deep residual reinforcement learning
od: Zhang, S, i wsp.
Wydane: (2020)

Learning retrospective knowledge with reverse reinforcement learning
od: Zhang, S, i wsp.
Wydane: (2020)

Bayesian action decoder for deep multi-agent reinforcement learning
od: Whiteson, S
Wydane: (2019)

Reinforcement Learning by Policy Search
od: Peshkin, Leonid
Wydane: (2004)

Learning to communicate with Deep multi-agent reinforcement learning
od: Foerster, J, i wsp.
Wydane: (2016)

Deep variational reinforcement learning for POMDPs
od: Igl, M, i wsp.
Wydane: (2018)

GradientDICE: rethinking generalized offline estimation of stationary values
od: Zhang, S, i wsp.
Wydane: (2020)

VIREL: A variational inference framework for reinforcement learning
od: Fellows, M, i wsp.
Wydane: (2019)

Stabilization Policy, Expected Output and Employment.
od: Bond, S
Wydane: (1988)

On Expectations, Government Policy and the Rate of Investment.
od: Nickell, S
Wydane: (1974)

Learning and expectations in macroeconomics /
od: Evans, George W., 1949-, i wsp.
Wydane: (2001)

Exploration in approximate hyper-state space for meta reinforcement learning
od: Zintgraf, L, i wsp.
Wydane: (2021)

Transient non−stationarity and generalisation in deep reinforcement learning
od: Igl, M, i wsp.
Wydane: (2021)

Verifiable reinforcement learning via policy extraction
od: Solar Lezama, Armando, i wsp.
Wydane: (2021)

Verified probabilistic policies for deep reinforcement learning
od: Bacci, E, i wsp.
Wydane: (2022)

Off-policy reinforcement learning with Gaussian processes
od: Chowdhary, Girish, i wsp.
Wydane: (2015)

Nonparametric Bayesian Policy Priors for Reinforcement Learning
od: Doshi-Velez, Finale P., i wsp.
Wydane: (2011)

Multi-agent common knowledge reinforcement learning
od: de Witt, C, i wsp.
Wydane: (2019)

Policy gradient methods for linear quadratic problems
od: Yang, H
Wydane: (2022)

TreeQN and ATreeC: differentiable tree planning for deep reinforcement learning
od: Farquhar, G, i wsp.
Wydane: (2018)

Inflation-Target Expectations and Optimal Monetary Policy.
od: Kapadia, S
Wydane: (2005)

Inflation-target expectations and optimal monetary policy
od: Kapadia, S
Wydane: (2005)

Reinforcement learning enhanced quantum-inspired algorithm for combinatorial optimization
od: Beloborodov, D, i wsp.
Wydane: (2020)