Alternating optimisation and quadrature for robust control
Bayesian optimisation has been successfully applied to a variety of reinforcement learning problems. However, the traditional approach for learning optimal policies in simulators does not utilise the opportunity to improve learning by adjusting certain environment variables: state features that are...
Prif Awduron: | , , , , , |
---|---|
Fformat: | Conference item |
Cyhoeddwyd: |
AAAI Press
2018
|