A novel framework for policy mirror descent with general parameterization and linear convergence

Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe their success to the use of parameterized policies. However, while theoretical guarantees have been established for this class of algorithms, especially in the tabular setting, the use of general parameterization...

Full description

Bibliographic Details
Main Authors: Alfano, C, Yuan, R, Rebeschini, P
Format: Conference item
Language:English
Published: Curran Associates 2024