Breaking the deadly triad with a target network
The deadly triad refers to the instability of a reinforcement learning algorithm when it employs off-policy learning, function approximation, and bootstrapping simultaneously. In this paper, we investigate the target network as a tool for breaking the deadly triad, providing theoretical support for...
Հիմնական հեղինակներ: | , , |
---|---|
Ձևաչափ: | Conference item |
Լեզու: | English |
Հրապարակվել է: |
PMLR
2021
|