GradientDICE: rethinking generalized offline estimation of stationary values
We present GradientDICE for estimating the density ratio between the state distribution of the target policy and the sampling distribution in off-policy reinforcement learning. GradientDICE fixes several problems of GenDICE (Zhang et al., 2020), the current state-of-the-art for estimating such densi...
المؤلفون الرئيسيون: | , , |
---|---|
التنسيق: | Conference item |
اللغة: | English |
منشور في: |
Journal of Machine Learning Research
2020
|