Truncated emphatic temporal difference methods for prediction and control

Emphatic Temporal Difference (TD) methods are a class of off-policy Reinforcement Learning (RL) methods involving the use of followon traces. Despite the theoretical success of emphatic TD methods in addressing the notorious deadly triad of off-policy RL, there are still two open problems. First, fo...

Ամբողջական նկարագրություն

Մատենագիտական մանրամասներ
Հիմնական հեղինակներ:	Zhang, S, Whiteson, S
Ձևաչափ:	Journal article
Լեզու:	English
Հրապարակվել է:	Journal of Machine Learning Research 2022

Truncated emphatic temporal difference methods for prediction and control

Նմանատիպ նյութեր