Truncated emphatic temporal difference methods for prediction and control
Emphatic Temporal Difference (TD) methods are a class of off-policy Reinforcement Learning (RL) methods involving the use of followon traces. Despite the theoretical success of emphatic TD methods in addressing the notorious deadly triad of off-policy RL, there are still two open problems. First, fo...
Үндсэн зохиолчид: | , |
---|---|
Формат: | Journal article |
Хэл сонгох: | English |
Хэвлэсэн: |
Journal of Machine Learning Research
2022
|