Optimal convergence rate for exact policy mirror descent in discounted Markov decision processes

<p>Policy Mirror Descent (PMD) is a general family of algorithms that covers a wide range of novel and fundamental methods in reinforcement learning. Motivated by the instability of policy iteration (PI) with inexact policy evaluation, unregularised PMD algorithmically regularises the policy i...

Disgrifiad llawn

Manylion Llyfryddiaeth
Prif Awduron: Johnson, E, Pike-Burke, C, Rebeschini, P
Fformat: Conference item
Iaith:English
Cyhoeddwyd: NeurIPS 2023