Planning with learned ignorance-aware models

<p>One of the goals of artificial intelligence research is to create decision-makers (i.e., <em>agents</em>) that improve from experience (i.e., <em>data</em>), collected through interaction with an environment. Models of the environment (i.e., <em>world models<...

Full description

Bibliographic Details
Main Author: Filos, A
Other Authors: Gal, Y
Format: Thesis
Language:English
Published: 2022
Subjects:
_version_ 1811139284922531840
author Filos, A
author2 Gal, Y
author_facet Gal, Y
Filos, A
author_sort Filos, A
collection OXFORD
description <p>One of the goals of artificial intelligence research is to create decision-makers (i.e., <em>agents</em>) that improve from experience (i.e., <em>data</em>), collected through interaction with an environment. Models of the environment (i.e., <em>world models</em>) are an explicit way that agents use to represent their knowledge, enabling them to make counterfactual predictions and <em>plans</em> without requiring additional environment interactions. Although agents that plan with a perfect model of the environment have led to impressive demonstrations, e.g., super- human performance in board games, they are limited to problems their designer can specify a perfect model. Therefore, <em>learning</em> models from experience holds the promise of going beyond the scope of their designers’ reach, giving rise to a self-improving vicious circle of (i) learning a model from the past experience; (ii) planning with the learned model; and (iii) interacting with the environment, collecting new experiences. Ideally, learned models should <em>generalise</em> to situations beyond their training regime. Nonetheless, this is ambitious and often unrealistic when finite data is used for learning the models, leading to generally imperfect models, with which naive planning could be catastrophic in novel, <em>out-of-training distribution</em> situations. A more pragmatic goal is to have agents that are aware of and quantify their lack of knowledge (i.e., <em>ignorance or epistemic uncertainty</em>).</p> <p>In this thesis, we motivate and demonstrate the effectiveness of and propose novel ignorance-aware agents that plan with learned models. Naively applying powerful planning algorithms to learned models can render negative results, when the planning algorithm exploits the model imperfections in out-of-training distribution situations. This phenomenon is often termed overoptimisation and can be addressed by optimising ignorance-augmented objectives, called knowledge equivalents. We verify the validity of our ideas and methods in a number of problem settings, including learning from (i) expert demonstrations (<em>imitation learning</em>, §3); (ii) sub-optimal demonstrations (<em>social learning</em>, §4); and (iii) interacting with an environment with rewards (<em>reinforcement learning</em>, §5). Our empirical evidence is based on simulated autonomous driving environments, continuous control and video games from pixels and didactic small-scale grid-worlds. Throughout the thesis, we use <em>neural networks</em> to parameterise the (learnable) models and either use existing scalable approximate ignorance quantification <em>deep learning</em> methods, such as ensembles, or introduce novel planning-specific ways to quantify the agents’ ignorance.</p> <p>The main chapters of this thesis are based on publications (Filos et al., 2020, 2021, 2022).</p>
first_indexed 2024-09-25T04:03:39Z
format Thesis
id oxford-uuid:7b087e0b-54ef-4c7c-b277-5a3242ed0e54
institution University of Oxford
language English
last_indexed 2024-09-25T04:03:39Z
publishDate 2022
record_format dspace
spelling oxford-uuid:7b087e0b-54ef-4c7c-b277-5a3242ed0e542024-05-13T09:47:58ZPlanning with learned ignorance-aware modelsThesishttp://purl.org/coar/resource_type/c_db06uuid:7b087e0b-54ef-4c7c-b277-5a3242ed0e54Machine learningArtificial intelligenceReinforcement learningPlanningEnglishHyrax Deposit2022Filos, AGal, YGrefenstette, EFoerster, J<p>One of the goals of artificial intelligence research is to create decision-makers (i.e., <em>agents</em>) that improve from experience (i.e., <em>data</em>), collected through interaction with an environment. Models of the environment (i.e., <em>world models</em>) are an explicit way that agents use to represent their knowledge, enabling them to make counterfactual predictions and <em>plans</em> without requiring additional environment interactions. Although agents that plan with a perfect model of the environment have led to impressive demonstrations, e.g., super- human performance in board games, they are limited to problems their designer can specify a perfect model. Therefore, <em>learning</em> models from experience holds the promise of going beyond the scope of their designers’ reach, giving rise to a self-improving vicious circle of (i) learning a model from the past experience; (ii) planning with the learned model; and (iii) interacting with the environment, collecting new experiences. Ideally, learned models should <em>generalise</em> to situations beyond their training regime. Nonetheless, this is ambitious and often unrealistic when finite data is used for learning the models, leading to generally imperfect models, with which naive planning could be catastrophic in novel, <em>out-of-training distribution</em> situations. A more pragmatic goal is to have agents that are aware of and quantify their lack of knowledge (i.e., <em>ignorance or epistemic uncertainty</em>).</p> <p>In this thesis, we motivate and demonstrate the effectiveness of and propose novel ignorance-aware agents that plan with learned models. Naively applying powerful planning algorithms to learned models can render negative results, when the planning algorithm exploits the model imperfections in out-of-training distribution situations. This phenomenon is often termed overoptimisation and can be addressed by optimising ignorance-augmented objectives, called knowledge equivalents. We verify the validity of our ideas and methods in a number of problem settings, including learning from (i) expert demonstrations (<em>imitation learning</em>, §3); (ii) sub-optimal demonstrations (<em>social learning</em>, §4); and (iii) interacting with an environment with rewards (<em>reinforcement learning</em>, §5). Our empirical evidence is based on simulated autonomous driving environments, continuous control and video games from pixels and didactic small-scale grid-worlds. Throughout the thesis, we use <em>neural networks</em> to parameterise the (learnable) models and either use existing scalable approximate ignorance quantification <em>deep learning</em> methods, such as ensembles, or introduce novel planning-specific ways to quantify the agents’ ignorance.</p> <p>The main chapters of this thesis are based on publications (Filos et al., 2020, 2021, 2022).</p>
spellingShingle Machine learning
Artificial intelligence
Reinforcement learning
Planning
Filos, A
Planning with learned ignorance-aware models
title Planning with learned ignorance-aware models
title_full Planning with learned ignorance-aware models
title_fullStr Planning with learned ignorance-aware models
title_full_unstemmed Planning with learned ignorance-aware models
title_short Planning with learned ignorance-aware models
title_sort planning with learned ignorance aware models
topic Machine learning
Artificial intelligence
Reinforcement learning
Planning
work_keys_str_mv AT filosa planningwithlearnedignoranceawaremodels