Optimal convergence rate for exact policy mirror descent in discounted Markov decision processes

<p>Policy Mirror Descent (PMD) is a general family of algorithms that covers a wide range of novel and fundamental methods in reinforcement learning. Motivated by the instability of policy iteration (PI) with inexact policy evaluation, unregularised PMD algorithmically regularises the policy i...

Full description

Bibliographic Details
Main Authors: Johnson, E, Pike-Burke, C, Rebeschini, P
Format: Conference item
Language:English
Published: NeurIPS 2023