Entropic Regularization of Markov Decision Processes
An optimal feedback controller for a given Markov decision process (MDP) can in principle be synthesized by value or policy iteration. However, if the system dynamics and the reward function are unknown, a learning agent must discover an optimal controller via direct interaction with the environment...
المؤلفون الرئيسيون: | , |
---|---|
التنسيق: | مقال |
اللغة: | English |
منشور في: |
MDPI AG
2019-07-01
|
سلاسل: | Entropy |
الموضوعات: | |
الوصول للمادة أونلاين: | https://www.mdpi.com/1099-4300/21/7/674 |