Entropic Regularization of Markov Decision Processes

An optimal feedback controller for a given Markov decision process (MDP) can in principle be synthesized by value or policy iteration. However, if the system dynamics and the reward function are unknown, a learning agent must discover an optimal controller via direct interaction with the environment...

Celý popis

Podrobná bibliografie
Hlavní autoři: Boris Belousov, Jan Peters
Médium: Článek
Jazyk:English
Vydáno: MDPI AG 2019-07-01
Edice:Entropy
Témata:
On-line přístup:https://www.mdpi.com/1099-4300/21/7/674