Summary: | <p>One of the goals of artificial intelligence research is to create decision-makers (i.e., <em>agents</em>) that improve from experience (i.e., <em>data</em>), collected through interaction with an environment. Models of the environment (i.e., <em>world models</em>) are an explicit way that agents use to represent their knowledge, enabling them to make counterfactual predictions and <em>plans</em> without requiring additional environment interactions. Although agents that plan with a perfect model of the environment have led to impressive demonstrations, e.g., super- human performance in board games, they are limited to problems their designer can specify a perfect model. Therefore, <em>learning</em> models from experience holds the promise of going beyond the scope of their designers’ reach, giving rise to a self-improving vicious circle of (i) learning a model from the past experience; (ii) planning with the learned model; and (iii) interacting with the environment, collecting new experiences. Ideally, learned models should <em>generalise</em> to situations beyond their training regime. Nonetheless, this is ambitious and often unrealistic when finite data is used for learning the models, leading to generally imperfect models, with which naive planning could be catastrophic in novel, <em>out-of-training distribution</em> situations. A more pragmatic goal is to have agents that are aware of and quantify their lack of knowledge (i.e., <em>ignorance or epistemic uncertainty</em>).</p>
<p>In this thesis, we motivate and demonstrate the effectiveness of and propose novel ignorance-aware agents that plan with learned models. Naively applying powerful planning algorithms to learned models can render negative results, when the planning algorithm exploits the model imperfections in out-of-training distribution situations. This phenomenon is often termed overoptimisation and can be addressed by optimising ignorance-augmented objectives, called knowledge equivalents. We verify the validity of our ideas and methods in a number of problem settings, including learning from (i) expert demonstrations (<em>imitation learning</em>, §3); (ii) sub-optimal demonstrations (<em>social learning</em>, §4); and (iii) interacting with an environment with rewards (<em>reinforcement learning</em>, §5). Our empirical evidence is based on simulated autonomous driving environments, continuous control and video games from pixels and didactic small-scale grid-worlds. Throughout the thesis, we use <em>neural networks</em> to parameterise the (learnable) models and either use existing scalable approximate ignorance quantification <em>deep learning</em> methods, such as ensembles, or introduce novel planning-specific ways to quantify the agents’ ignorance.</p>
<p>The main chapters of this thesis are based on publications (Filos et al., 2020, 2021, 2022).</p>
|