Near-Optimal Learning in Sequential Games
Decision making is ubiquitous, and some problems become particularly challenging due to their sequential nature, where later decisions depend on earlier ones. While humans have been attempting to solve sequential decision making problems for a long time, modern computational and machine learning tec...
Main Author: | |
---|---|
Other Authors: | |
Format: | Thesis |
Published: |
Massachusetts Institute of Technology
2023
|
Online Access: | https://hdl.handle.net/1721.1/151570 |
_version_ | 1826208008359641088 |
---|---|
author | Yu, Tiancheng |
author2 | Sra, Suvrit |
author_facet | Sra, Suvrit Yu, Tiancheng |
author_sort | Yu, Tiancheng |
collection | MIT |
description | Decision making is ubiquitous, and some problems become particularly challenging due to their sequential nature, where later decisions depend on earlier ones. While humans have been attempting to solve sequential decision making problems for a long time, modern computational and machine learning techniques are needed to find the optimal decision rule. One popular approach is the reinforcement learning (RL) perspective, in which an agent learns the optimal decision rule by receiving rewards based on its actions.
In the presence of multiple learning agents, sequential decision making problems become sequential games. In this setting, the learning objective shifts from finding an optimal decision rule to finding a Nash equilibrium, where none of the agents can increase their reward by unilaterally switching to another decision rule. To handle both the sequential nature of the problem and the presence of the other learning agents, multi-agent RL tasks require even more data than supervised learning and single-agent RL tasks. Consequently, sample efficiency becomes a critical concern for the success of multi-agent RL.
In this thesis, I study argubly the most fundamental problems of learning in sequential games:
1. (Lower bound) How many samples are necessary to find a Nash equilibrium in a sequential game, no matter what learning algorithm is used?
2. (Upper bound) How to design (computationally) efficient learning algorithms with sharp sample complexity guarantees?
When the upper and lower bounds match each other, (minimax) optimal learning is achieved. It turns out utilizing structures of sequential games is the key towards optimal learning. In this thesis, we investigate near-optimal learning in two types of sequential games:
1. (Markov games) All the agents can observe the underlying states (Chapter 2) and,
2. (Extensive-form games) Different agents can have different observations given the same state (Chapter 5).
To achieve near-optimal learning, a series of novel algorithmic idea and analytical tools will be introduced, such as
1. (Adaptive uncertainty quantification) Sharp uncertainty quantification of the value function estimations to design near-optimal exploration bonus (Chapter 3),
2. (Certified policy) A non-uniform and step-wise reweighting of historical policies to produce approximate Nash equilibrium policies (Chapter 4),
3. (Balanced exploration) Achieing optimal exploration of a game tree based on the size of the subtrees (Chapter 6),
4. (Log-partition function reformulation) Re-interpreting classical algorithms as computing gradients of a log-partition function (Chapter 7),
which may be of independent interest. |
first_indexed | 2024-09-23T13:58:52Z |
format | Thesis |
id | mit-1721.1/151570 |
institution | Massachusetts Institute of Technology |
last_indexed | 2024-09-23T13:58:52Z |
publishDate | 2023 |
publisher | Massachusetts Institute of Technology |
record_format | dspace |
spelling | mit-1721.1/1515702023-08-01T03:29:05Z Near-Optimal Learning in Sequential Games Yu, Tiancheng Sra, Suvrit Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Decision making is ubiquitous, and some problems become particularly challenging due to their sequential nature, where later decisions depend on earlier ones. While humans have been attempting to solve sequential decision making problems for a long time, modern computational and machine learning techniques are needed to find the optimal decision rule. One popular approach is the reinforcement learning (RL) perspective, in which an agent learns the optimal decision rule by receiving rewards based on its actions. In the presence of multiple learning agents, sequential decision making problems become sequential games. In this setting, the learning objective shifts from finding an optimal decision rule to finding a Nash equilibrium, where none of the agents can increase their reward by unilaterally switching to another decision rule. To handle both the sequential nature of the problem and the presence of the other learning agents, multi-agent RL tasks require even more data than supervised learning and single-agent RL tasks. Consequently, sample efficiency becomes a critical concern for the success of multi-agent RL. In this thesis, I study argubly the most fundamental problems of learning in sequential games: 1. (Lower bound) How many samples are necessary to find a Nash equilibrium in a sequential game, no matter what learning algorithm is used? 2. (Upper bound) How to design (computationally) efficient learning algorithms with sharp sample complexity guarantees? When the upper and lower bounds match each other, (minimax) optimal learning is achieved. It turns out utilizing structures of sequential games is the key towards optimal learning. In this thesis, we investigate near-optimal learning in two types of sequential games: 1. (Markov games) All the agents can observe the underlying states (Chapter 2) and, 2. (Extensive-form games) Different agents can have different observations given the same state (Chapter 5). To achieve near-optimal learning, a series of novel algorithmic idea and analytical tools will be introduced, such as 1. (Adaptive uncertainty quantification) Sharp uncertainty quantification of the value function estimations to design near-optimal exploration bonus (Chapter 3), 2. (Certified policy) A non-uniform and step-wise reweighting of historical policies to produce approximate Nash equilibrium policies (Chapter 4), 3. (Balanced exploration) Achieing optimal exploration of a game tree based on the size of the subtrees (Chapter 6), 4. (Log-partition function reformulation) Re-interpreting classical algorithms as computing gradients of a log-partition function (Chapter 7), which may be of independent interest. Ph.D. 2023-07-31T19:49:25Z 2023-07-31T19:49:25Z 2023-06 2023-07-13T14:31:07.554Z Thesis https://hdl.handle.net/1721.1/151570 In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/ application/pdf Massachusetts Institute of Technology |
spellingShingle | Yu, Tiancheng Near-Optimal Learning in Sequential Games |
title | Near-Optimal Learning in Sequential Games |
title_full | Near-Optimal Learning in Sequential Games |
title_fullStr | Near-Optimal Learning in Sequential Games |
title_full_unstemmed | Near-Optimal Learning in Sequential Games |
title_short | Near-Optimal Learning in Sequential Games |
title_sort | near optimal learning in sequential games |
url | https://hdl.handle.net/1721.1/151570 |
work_keys_str_mv | AT yutiancheng nearoptimallearninginsequentialgames |