Learning to Plan via Deep Optimistic Value Exploration

Deep exploration requires coordinated long-term planning. We present a model-based reinforcement learning algorithm that guides policy learning through a value function that exhibits optimism in the face of uncertainty. We capture uncertainty over values by combining predictions from an ensemble o...

Full description

Bibliographic Details
Main Authors: Seyde, Tim, Schwarting, Wilko, Karaman, Sertac, Rus, Daniela L
Other Authors: Massachusetts Institute of Technology. Computer Science and Artificial Intelligence Laboratory
Format: Article
Published: 2020
Online Access:https://hdl.handle.net/1721.1/125161