Learning to Plan via Deep Optimistic Value Exploration
Deep exploration requires coordinated long-term planning. We present a model-based reinforcement learning algorithm that guides policy learning through a value function that exhibits optimism in the face of uncertainty. We capture uncertainty over values by combining predictions from an ensemble o...
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Article |
Published: |
2020
|
Online Access: | https://hdl.handle.net/1721.1/125161 |