GradientDICE: rethinking generalized offline estimation of stationary values

We present GradientDICE for estimating the density ratio between the state distribution of the target policy and the sampling distribution in off-policy reinforcement learning. GradientDICE fixes several problems of GenDICE (Zhang et al., 2020), the current state-of-the-art for estimating such densi...

Бүрэн тодорхойлолт

Номзүйн дэлгэрэнгүй
Үндсэн зохиолчид:	Zhang, S, Liu, B, Whiteson, S
Формат:	Conference item
Хэл сонгох:	English
Хэвлэсэн:	Journal of Machine Learning Research 2020

GradientDICE: rethinking generalized offline estimation of stationary values

Ижил төстэй зүйлс