Truncated emphatic temporal difference methods for prediction and control
Emphatic Temporal Difference (TD) methods are a class of off-policy Reinforcement Learning (RL) methods involving the use of followon traces. Despite the theoretical success of emphatic TD methods in addressing the notorious deadly triad of off-policy RL, there are still two open problems. First, fo...
Автори: | Zhang, S, Whiteson, S |
---|---|
Формат: | Journal article |
Мова: | English |
Опубліковано: |
Journal of Machine Learning Research
2022
|
Схожі ресурси
Схожі ресурси
-
Temporal and Dynamic Characteristics of Vowels under Neutral and Emphatic
за авторством: Sergey V. Batalin, та інші
Опубліковано: (2018-11-01) -
On the Concept of Emphatic Rheme
за авторством: Yelena Mkhitarian
Опубліковано: (2005-10-01) -
The Study of Emphatic L
за авторством: Zohreh Keyani, та інші
Опубліковано: (2013-04-01) -
The Study of Emphatic L
за авторством: Reza Shokrani, та інші
Опубліковано: (2013-05-01) -
EMPHATIC APOLOGY IN GERMAN LINGUACULTURE
за авторством: Oleksandra M. Shumiatska
Опубліковано: (2019-12-01)