Breaking the deadly triad with a target network

The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, and bootstrapping simultaneously. In this paper, we investigate the target network as a tool for breaking the deadly triad, providing theoretical support for...

Πλήρης περιγραφή

Λεπτομέρειες βιβλιογραφικής εγγραφής
Κύριοι συγγραφείς:	Zhang, S, Yao, H, Whiteson, S
Μορφή:	Conference item
Γλώσσα:	English
Έκδοση:	PMLR 2021

Breaking the deadly triad with a target network

Παρόμοια τεκμήρια