Cautious reinforcement learning with logical constraints
This paper presents the concept of an adaptive safe padding that forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process. Policies are synthesised to satisfy a goal, expressed as a temporal logic formula, with maximal probability. E...
Main Authors: | , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
International Foundation for Autonomous Agents and Multiagent Systems
2020
|