Entropic Regularization of Markov Decision Processes

An optimal feedback controller for a given Markov decision process (MDP) can in principle be synthesized by value or policy iteration. However, if the system dynamics and the reward function are unknown, a learning agent must discover an optimal controller via direct interaction with the environment...

وصف كامل

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	Boris Belousov, Jan Peters
التنسيق:	مقال
اللغة:	English
منشور في:	MDPI AG 2019-07-01
سلاسل:	Entropy
الموضوعات:	maximum entropy reinforcement learning actor-critic methods <i>f</i>-divergence KL control
الوصول للمادة أونلاين:	https://www.mdpi.com/1099-4300/21/7/674

الانترنت

https://www.mdpi.com/1099-4300/21/7/674

Entropic Regularization of Markov Decision Processes

الانترنت

مواد مشابهة