Planning with learned ignorance-aware models

One of the goals of artificial intelligence research is to create decision-makers (i.e., agents) that improve from experience (i.e., data), collected through interaction with an environment. Models of the environment (i.e., world models<...

Full description

Bibliographic Details
Main Author:	Filos, A
Other Authors:	Gal, Y
Format:	Thesis
Language:	English
Published:	2022
Subjects:	Machine learning Artificial intelligence Reinforcement learning Planning

_version_	1811139284922531840
author	Filos, A
author2	Gal, Y
author_facet	Gal, Y Filos, A
author_sort	Filos, A
collection	OXFORD
description	<p>One of the goals of artificial intelligence research is to create decision-makers (i.e., <em>agents</em>) that improve from experience (i.e., <em>data</em>), collected through interaction with an environment. Models of the environment (i.e., <em>world models</em>) are an explicit way that agents use to represent their knowledge, enabling them to make counterfactual predictions and <em>plans</em> without requiring additional environment interactions. Although agents that plan with a perfect model of the environment have led to impressive demonstrations, e.g., super- human performance in board games, they are limited to problems their designer can specify a perfect model. Therefore, <em>learning</em> models from experience holds the promise of going beyond the scope of their designers’ reach, giving rise to a self-improving vicious circle of (i) learning a model from the past experience; (ii) planning with the learned model; and (iii) interacting with the environment, collecting new experiences. Ideally, learned models should <em>generalise</em> to situations beyond their training regime. Nonetheless, this is ambitious and often unrealistic when finite data is used for learning the models, leading to generally imperfect models, with which naive planning could be catastrophic in novel, <em>out-of-training distribution</em> situations. A more pragmatic goal is to have agents that are aware of and quantify their lack of knowledge (i.e., <em>ignorance or epistemic uncertainty</em>).</p> <p>In this thesis, we motivate and demonstrate the effectiveness of and propose novel ignorance-aware agents that plan with learned models. Naively applying powerful planning algorithms to learned models can render negative results, when the planning algorithm exploits the model imperfections in out-of-training distribution situations. This phenomenon is often termed overoptimisation and can be addressed by optimising ignorance-augmented objectives, called knowledge equivalents. We verify the validity of our ideas and methods in a number of problem settings, including learning from (i) expert demonstrations (<em>imitation learning</em>, §3); (ii) sub-optimal demonstrations (<em>social learning</em>, §4); and (iii) interacting with an environment with rewards (<em>reinforcement learning</em>, §5). Our empirical evidence is based on simulated autonomous driving environments, continuous control and video games from pixels and didactic small-scale grid-worlds. Throughout the thesis, we use <em>neural networks</em> to parameterise the (learnable) models and either use existing scalable approximate ignorance quantification <em>deep learning</em> methods, such as ensembles, or introduce novel planning-specific ways to quantify the agents’ ignorance.</p> <p>The main chapters of this thesis are based on publications (Filos et al., 2020, 2021, 2022).</p>
first_indexed	2024-09-25T04:03:39Z
format	Thesis
id	oxford-uuid:7b087e0b-54ef-4c7c-b277-5a3242ed0e54
institution	University of Oxford
language	English
last_indexed	2024-09-25T04:03:39Z
publishDate	2022
record_format	dspace
spelling	oxford-uuid:7b087e0b-54ef-4c7c-b277-5a3242ed0e542024-05-13T09:47:58ZPlanning with learned ignorance-aware modelsThesishttp://purl.org/coar/resource_type/c_db06uuid:7b087e0b-54ef-4c7c-b277-5a3242ed0e54Machine learningArtificial intelligenceReinforcement learningPlanningEnglishHyrax Deposit2022Filos, AGal, YGrefenstette, EFoerster, J<p>One of the goals of artificial intelligence research is to create decision-makers (i.e., <em>agents</em>) that improve from experience (i.e., <em>data</em>), collected through interaction with an environment. Models of the environment (i.e., <em>world models</em>) are an explicit way that agents use to represent their knowledge, enabling them to make counterfactual predictions and <em>plans</em> without requiring additional environment interactions. Although agents that plan with a perfect model of the environment have led to impressive demonstrations, e.g., super- human performance in board games, they are limited to problems their designer can specify a perfect model. Therefore, <em>learning</em> models from experience holds the promise of going beyond the scope of their designers’ reach, giving rise to a self-improving vicious circle of (i) learning a model from the past experience; (ii) planning with the learned model; and (iii) interacting with the environment, collecting new experiences. Ideally, learned models should <em>generalise</em> to situations beyond their training regime. Nonetheless, this is ambitious and often unrealistic when finite data is used for learning the models, leading to generally imperfect models, with which naive planning could be catastrophic in novel, <em>out-of-training distribution</em> situations. A more pragmatic goal is to have agents that are aware of and quantify their lack of knowledge (i.e., <em>ignorance or epistemic uncertainty</em>).</p> <p>In this thesis, we motivate and demonstrate the effectiveness of and propose novel ignorance-aware agents that plan with learned models. Naively applying powerful planning algorithms to learned models can render negative results, when the planning algorithm exploits the model imperfections in out-of-training distribution situations. This phenomenon is often termed overoptimisation and can be addressed by optimising ignorance-augmented objectives, called knowledge equivalents. We verify the validity of our ideas and methods in a number of problem settings, including learning from (i) expert demonstrations (<em>imitation learning</em>, §3); (ii) sub-optimal demonstrations (<em>social learning</em>, §4); and (iii) interacting with an environment with rewards (<em>reinforcement learning</em>, §5). Our empirical evidence is based on simulated autonomous driving environments, continuous control and video games from pixels and didactic small-scale grid-worlds. Throughout the thesis, we use <em>neural networks</em> to parameterise the (learnable) models and either use existing scalable approximate ignorance quantification <em>deep learning</em> methods, such as ensembles, or introduce novel planning-specific ways to quantify the agents’ ignorance.</p> <p>The main chapters of this thesis are based on publications (Filos et al., 2020, 2021, 2022).</p>
spellingShingle	Machine learning Artificial intelligence Reinforcement learning Planning Filos, A Planning with learned ignorance-aware models
title	Planning with learned ignorance-aware models
title_full	Planning with learned ignorance-aware models
title_fullStr	Planning with learned ignorance-aware models
title_full_unstemmed	Planning with learned ignorance-aware models
title_short	Planning with learned ignorance-aware models
title_sort	planning with learned ignorance aware models
topic	Machine learning Artificial intelligence Reinforcement learning Planning
work_keys_str_mv	AT filosa planningwithlearnedignoranceawaremodels

Planning with learned ignorance-aware models

Similar Items