Towards truly open-ended reinforcement learning
<p>Deep reinforcement learning (RL) has achieved some remarkable successes in the past decade, from super human performance in games to real world problems such as robotics and even Nuclear Fusion. Indeed, given its generality, many prominent researchers in the field believe RL alone may be su...
Главный автор: | |
---|---|
Другие авторы: | |
Формат: | Диссертация |
Язык: | English |
Опубликовано: |
2022
|
_version_ | 1826309531309703168 |
---|---|
author | Parker-Holder, J |
author2 | Roberts, S |
author_facet | Roberts, S Parker-Holder, J |
author_sort | Parker-Holder, J |
collection | OXFORD |
description | <p>Deep reinforcement learning (RL) has achieved some remarkable successes in the past decade, from super human performance in games to real world problems such as robotics and even Nuclear Fusion. Indeed, given its generality, many prominent researchers in the field believe RL alone may be sufficient for producing Artificial General Intelligence (AGI). It is easy to see why, RL is in theory an open-ended process where an agent never stops learning from its own experiences, given a suitably complex environment. In this thesis we posit that the key factor limiting RL agents is the requirement for static, human designed configurations. On the agent side, we typically tune a single set of hyperparameters for a specific agent with a specific architecture, ignoring the fact these may need to adapt over time. At the same time, even if we did have a powerful RL agent, we lack a sufficiently complex environment that could facilitate the learning of general behaviors.</p>
<p>We hypothesize that the only way to get past this problem is to embrace open-endedness, by designing systems with infinite capacity to produce new, interesting things. In the first part we introduce new methods for adapting several important agent hyperparameters on the fly, using a population of agents. We show this makes it possible for agents to adapt several of their own hyperparameters over time. Next we introduce a new approach for automatically designing environments, evolving a curriculum that constantly proposes new challenges at the frontier of the student agent’s capabilities. Combining these two advances could produce an open-ended learning system where agents and environments co-adapt over time, producing increasingly complex problems and an agent that can solve them.</p>
<p>However, even this would not be truly open-ended, as it would eventually plateau once the agent can solve every task from a human-specified distribution. In the second part of the thesis we propose directions to make this system unbounded. First, we introduce two new approaches for encouraging the discovery of diverse solutions, which we show can help avoid deceptive local optima and discover a broader set of behaviors. Finally, we posit that one path towards a truly open-ended system is to remove the need for human-designed simulated environments altogether and instead train agents inside learned world models. We discuss several contributions in this area including active data acquisition to improve the world model, as well as an approach to produce synthetic experiences inside the world model that increase the robustness of our agents. We conclude with a proposal for a future system combining these insights, which we believe could be truly open-ended.</p> |
first_indexed | 2024-03-07T07:35:36Z |
format | Thesis |
id | oxford-uuid:e20bd6be-2a47-4e0f-bb6f-30df290e58c8 |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-07T07:35:36Z |
publishDate | 2022 |
record_format | dspace |
spelling | oxford-uuid:e20bd6be-2a47-4e0f-bb6f-30df290e58c82023-03-07T09:17:13ZTowards truly open-ended reinforcement learningThesishttp://purl.org/coar/resource_type/c_db06uuid:e20bd6be-2a47-4e0f-bb6f-30df290e58c8EnglishHyrax Deposit2022Parker-Holder, JRoberts, S<p>Deep reinforcement learning (RL) has achieved some remarkable successes in the past decade, from super human performance in games to real world problems such as robotics and even Nuclear Fusion. Indeed, given its generality, many prominent researchers in the field believe RL alone may be sufficient for producing Artificial General Intelligence (AGI). It is easy to see why, RL is in theory an open-ended process where an agent never stops learning from its own experiences, given a suitably complex environment. In this thesis we posit that the key factor limiting RL agents is the requirement for static, human designed configurations. On the agent side, we typically tune a single set of hyperparameters for a specific agent with a specific architecture, ignoring the fact these may need to adapt over time. At the same time, even if we did have a powerful RL agent, we lack a sufficiently complex environment that could facilitate the learning of general behaviors.</p> <p>We hypothesize that the only way to get past this problem is to embrace open-endedness, by designing systems with infinite capacity to produce new, interesting things. In the first part we introduce new methods for adapting several important agent hyperparameters on the fly, using a population of agents. We show this makes it possible for agents to adapt several of their own hyperparameters over time. Next we introduce a new approach for automatically designing environments, evolving a curriculum that constantly proposes new challenges at the frontier of the student agent’s capabilities. Combining these two advances could produce an open-ended learning system where agents and environments co-adapt over time, producing increasingly complex problems and an agent that can solve them.</p> <p>However, even this would not be truly open-ended, as it would eventually plateau once the agent can solve every task from a human-specified distribution. In the second part of the thesis we propose directions to make this system unbounded. First, we introduce two new approaches for encouraging the discovery of diverse solutions, which we show can help avoid deceptive local optima and discover a broader set of behaviors. Finally, we posit that one path towards a truly open-ended system is to remove the need for human-designed simulated environments altogether and instead train agents inside learned world models. We discuss several contributions in this area including active data acquisition to improve the world model, as well as an approach to produce synthetic experiences inside the world model that increase the robustness of our agents. We conclude with a proposal for a future system combining these insights, which we believe could be truly open-ended.</p> |
spellingShingle | Parker-Holder, J Towards truly open-ended reinforcement learning |
title | Towards truly open-ended reinforcement learning |
title_full | Towards truly open-ended reinforcement learning |
title_fullStr | Towards truly open-ended reinforcement learning |
title_full_unstemmed | Towards truly open-ended reinforcement learning |
title_short | Towards truly open-ended reinforcement learning |
title_sort | towards truly open ended reinforcement learning |
work_keys_str_mv | AT parkerholderj towardstrulyopenendedreinforcementlearning |