Fingerprint policy optimisation for robust reinforcement learning

Policy gradient methods ignore the potential value of adjusting environment variables: unobservable state features that are randomly determined by the environment in a physical setting, but are controllable in a simulator. This can lead to slow learning, or convergence to suboptimal policies, if the...

Full description

Bibliographic Details
Main Authors:	Paul, S, Osborne, M, Whiteson, S
Format:	Conference item
Published:	Journal of Machine Learning Research 2019

Fingerprint policy optimisation for robust reinforcement learning

Similar Items