Entropic Regularization of Markov Decision Processes
An optimal feedback controller for a given Markov decision process (MDP) can in principle be synthesized by value or policy iteration. However, if the system dynamics and the reward function are unknown, a learning agent must discover an optimal controller via direct interaction with the environment...
Hlavní autoři: | , |
---|---|
Médium: | Článek |
Jazyk: | English |
Vydáno: |
MDPI AG
2019-07-01
|
Edice: | Entropy |
Témata: | |
On-line přístup: | https://www.mdpi.com/1099-4300/21/7/674 |