On Convergence Rate of MRetrace

Off-policy is a key setting for reinforcement learning algorithms. In recent years, the stability of off-policy learning for value-based reinforcement learning has been guaranteed even when combined with linear function approximation and bootstrapping. Convergence rate analysis is currently a hot to...

Full description

Bibliographic Details
Main Authors: Xingguo Chen, Wangrong Qin, Yu Gong, Shangdong Yang, Wenhao Wang
Format: Article
Language:English
Published: MDPI AG 2024-09-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/12/18/2930