Improving single and multi-agent deep reinforcement learning methods
<p>Reinforcement Learning (RL) is a framework where an agent learns to make decisions using data-driven feedback from interactions with the environment in the form of rewards or penalties for actions. Deep RL integrates deep learning with RL, harnessing the power of deep neural networks to pro...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Language: | English |
Published: |
2023
|
Subjects: |
_version_ | 1826314893290110976 |
---|---|
author | Gupta, T |
author2 | Whiteson, S |
author_facet | Whiteson, S Gupta, T |
author_sort | Gupta, T |
collection | OXFORD |
description | <p>Reinforcement Learning (RL) is a framework where an agent learns to make decisions using data-driven feedback from interactions with the environment in the form of rewards or penalties for actions. Deep RL integrates deep learning with RL, harnessing the power of deep neural networks to process complex, high-dimensional data. Using the framework of deep RL, our machine learning research community has achieved tremendous progress in enabling machines to make sequential decisions over long time horizons. These advances include attaining super-human performance in Atari [Mnih et al., 2015], mastering the game of Go, beating the human world champion [Silver et al., 2017], providing robust recommendation systems [GomezUribe and Hunt, 2015, Singh et al., 2021]. This thesis focuses on identifying some key challenges that impede the learning of RL agents within their specific environments and improving the methods leading to better performance of agents, improved sample efficiency, and generalizability of learned agent policies.</p>
<p>In Part I of the thesis, we focus on exploration in single-agent RL settings where an agent must interact with a complex environment to pursue a goal. An agent that fails to explore its environment is unlikely to achieve high performance, as it will miss critical rewards and, as a result, cannot learn optimal behavior. One key challenge arises from sparse reward environments where the agent only receives feedback once the task is completed, making exploration more challenging. We propose a novel method that enables semantic exploration, resulting in higher sample efficiency and performance on sparse reward tasks.</p>
<p>In Part II of the thesis, we focus on cooperative Multi-Agent Reinforcement Learning (MARL), an extension of the usual RL setting, where we consider multiple agents interacting in the same environment toward a shared task. In multi-agent tasks requiring significant coordination among agents with strict penalties for miscoordination, state-of-the-art MARL methods often fail to learn useful behaviors as agents get stuck in a sub-optimal equilibrium. Another challenge is exploration in the joint action space of all agents, which grows exponentially with the number of agents. To address these challenges, we propose innovative approaches like universal value exploration and scalable role-based learning. These methods facilitate improved coordination among agents, faster exploration, and enhance the agents’ ability to adapt to new environments and tasks, showcasing zero-shot generalization capabilities and resulting in higher sample efficiency. Lastly, we investigate independent policybased methods in cooperative MARL, where each agent considers other agents as part of the environment. We show that such methods can perform better than state-of-the-art joint learning approaches on a popular multi-agent benchmark.</p>
<p>In summary, the contributions of this thesis significantly improve the stateof-the-art in deep (multi agent) reinforcement learning. The agents developed in his thesis can explore their environments efficiently to improve sample efficiency, learn tasks that require significant multi-agent coordination, and enable zero-shot generalization across various tasks.</p> |
first_indexed | 2024-12-09T03:15:55Z |
format | Thesis |
id | oxford-uuid:2ee45333-e42c-440c-bfe9-3430beeb653c |
institution | University of Oxford |
language | English |
last_indexed | 2024-12-09T03:15:55Z |
publishDate | 2023 |
record_format | dspace |
spelling | oxford-uuid:2ee45333-e42c-440c-bfe9-3430beeb653c2024-10-21T09:21:23ZImproving single and multi-agent deep reinforcement learning methodsThesishttp://purl.org/coar/resource_type/c_db06uuid:2ee45333-e42c-440c-bfe9-3430beeb653cMachine learningEnglishHyrax Deposit2023Gupta, TWhiteson, S<p>Reinforcement Learning (RL) is a framework where an agent learns to make decisions using data-driven feedback from interactions with the environment in the form of rewards or penalties for actions. Deep RL integrates deep learning with RL, harnessing the power of deep neural networks to process complex, high-dimensional data. Using the framework of deep RL, our machine learning research community has achieved tremendous progress in enabling machines to make sequential decisions over long time horizons. These advances include attaining super-human performance in Atari [Mnih et al., 2015], mastering the game of Go, beating the human world champion [Silver et al., 2017], providing robust recommendation systems [GomezUribe and Hunt, 2015, Singh et al., 2021]. This thesis focuses on identifying some key challenges that impede the learning of RL agents within their specific environments and improving the methods leading to better performance of agents, improved sample efficiency, and generalizability of learned agent policies.</p> <p>In Part I of the thesis, we focus on exploration in single-agent RL settings where an agent must interact with a complex environment to pursue a goal. An agent that fails to explore its environment is unlikely to achieve high performance, as it will miss critical rewards and, as a result, cannot learn optimal behavior. One key challenge arises from sparse reward environments where the agent only receives feedback once the task is completed, making exploration more challenging. We propose a novel method that enables semantic exploration, resulting in higher sample efficiency and performance on sparse reward tasks.</p> <p>In Part II of the thesis, we focus on cooperative Multi-Agent Reinforcement Learning (MARL), an extension of the usual RL setting, where we consider multiple agents interacting in the same environment toward a shared task. In multi-agent tasks requiring significant coordination among agents with strict penalties for miscoordination, state-of-the-art MARL methods often fail to learn useful behaviors as agents get stuck in a sub-optimal equilibrium. Another challenge is exploration in the joint action space of all agents, which grows exponentially with the number of agents. To address these challenges, we propose innovative approaches like universal value exploration and scalable role-based learning. These methods facilitate improved coordination among agents, faster exploration, and enhance the agents’ ability to adapt to new environments and tasks, showcasing zero-shot generalization capabilities and resulting in higher sample efficiency. Lastly, we investigate independent policybased methods in cooperative MARL, where each agent considers other agents as part of the environment. We show that such methods can perform better than state-of-the-art joint learning approaches on a popular multi-agent benchmark.</p> <p>In summary, the contributions of this thesis significantly improve the stateof-the-art in deep (multi agent) reinforcement learning. The agents developed in his thesis can explore their environments efficiently to improve sample efficiency, learn tasks that require significant multi-agent coordination, and enable zero-shot generalization across various tasks.</p> |
spellingShingle | Machine learning Gupta, T Improving single and multi-agent deep reinforcement learning methods |
title | Improving single and multi-agent deep reinforcement learning methods |
title_full | Improving single and multi-agent deep reinforcement learning methods |
title_fullStr | Improving single and multi-agent deep reinforcement learning methods |
title_full_unstemmed | Improving single and multi-agent deep reinforcement learning methods |
title_short | Improving single and multi-agent deep reinforcement learning methods |
title_sort | improving single and multi agent deep reinforcement learning methods |
topic | Machine learning |
work_keys_str_mv | AT guptat improvingsingleandmultiagentdeepreinforcementlearningmethods |