A simple mixture policy parameterization for improving sample efficiency of CVaR optimization

Reinforcement learning algorithms utilizing policy gradients (PG) to optimize Conditional Value at Risk (CVaR) face significant challenges with sample inefficiency, hindering their practical applications. This inefficiency stems from two main facts: a focus on tail-end performance that overlooks man...

Бүрэн тодорхойлолт

Номзүйн дэлгэрэнгүй
Үндсэн зохиолчид: Luo, Y, Pan, Y, Wang, H, Torr, P, Poupart, P
Формат: Conference item
Хэл сонгох:English
Хэвлэсэн: University of Massachusetts Amherst 2024