Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion Dilemma
Exploration is an integral part of learning dynamics which allows algorithms to search a space of solutions. When many algorithms simultaneously explore, this can lead to counter-intuitive effects. This paper contributes an analysis of the influence that exploration has on a multi-agent system of &l...
Main Author: | |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2024-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10414037/ |
_version_ | 1797330328030806016 |
---|---|
author | Cesare Carissimo |
author_facet | Cesare Carissimo |
author_sort | Cesare Carissimo |
collection | DOAJ |
description | Exploration is an integral part of learning dynamics which allows algorithms to search a space of solutions. When many algorithms simultaneously explore, this can lead to counter-intuitive effects. This paper contributes an analysis of the influence that exploration has on a multi-agent system of <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learners in a famous congestion dilemma, the Braess paradox. I find ranges of the exploration rate for which <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>-greedy <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learners show chaotic and oscillatory dynamics which do not converge, and yield better than Nash equilibrium results. I decouple the dynamics endogenous to <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learning from the exogenous exploration rate <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>, and find that <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learners implicitly coordinate with low exploration rates <inline-formula> <tex-math notation="LaTeX">$\epsilon \in (0, 0.1)$ </tex-math></inline-formula>, but are disrupted in their coordination for larger exploration rates <inline-formula> <tex-math notation="LaTeX">$\epsilon > 0.1$ </tex-math></inline-formula>. The best implicit coordination leads to a 20% reduction in average travel times which approaches the social optimum. I discuss how our results may inform multi-agent algorithm design, fit within a cognitive science perspective of cognitive noise during learning, and provide a mechanistic hypothesis for the lack of empirical evidence of the Braess Paradox in traffic systems. |
first_indexed | 2024-03-08T07:18:18Z |
format | Article |
id | doaj.art-3d8d7052e8434d11b013ad6d66e82fd7 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-08T07:18:18Z |
publishDate | 2024-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-3d8d7052e8434d11b013ad6d66e82fd72024-02-03T00:02:26ZengIEEEIEEE Access2169-35362024-01-0112159841599610.1109/ACCESS.2024.335860810414037Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion DilemmaCesare Carissimo0https://orcid.org/0000-0002-4383-7279Computational Social Science Group, ETH Zürich, Zürich, SwitzerlandExploration is an integral part of learning dynamics which allows algorithms to search a space of solutions. When many algorithms simultaneously explore, this can lead to counter-intuitive effects. This paper contributes an analysis of the influence that exploration has on a multi-agent system of <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learners in a famous congestion dilemma, the Braess paradox. I find ranges of the exploration rate for which <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>-greedy <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learners show chaotic and oscillatory dynamics which do not converge, and yield better than Nash equilibrium results. I decouple the dynamics endogenous to <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learning from the exogenous exploration rate <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>, and find that <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learners implicitly coordinate with low exploration rates <inline-formula> <tex-math notation="LaTeX">$\epsilon \in (0, 0.1)$ </tex-math></inline-formula>, but are disrupted in their coordination for larger exploration rates <inline-formula> <tex-math notation="LaTeX">$\epsilon > 0.1$ </tex-math></inline-formula>. The best implicit coordination leads to a 20% reduction in average travel times which approaches the social optimum. I discuss how our results may inform multi-agent algorithm design, fit within a cognitive science perspective of cognitive noise during learning, and provide a mechanistic hypothesis for the lack of empirical evidence of the Braess Paradox in traffic systems.https://ieeexplore.ieee.org/document/10414037/Braess paradoxchaoscongestion gameslearning dynamicsreinforcement learningQ-learning |
spellingShingle | Cesare Carissimo Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion Dilemma IEEE Access Braess paradox chaos congestion games learning dynamics reinforcement learning Q-learning |
title | Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion Dilemma |
title_full | Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion Dilemma |
title_fullStr | Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion Dilemma |
title_full_unstemmed | Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion Dilemma |
title_short | Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion Dilemma |
title_sort | counter intuitive effects of italic q italic learning exploration in a congestion dilemma |
topic | Braess paradox chaos congestion games learning dynamics reinforcement learning Q-learning |
url | https://ieeexplore.ieee.org/document/10414037/ |
work_keys_str_mv | AT cesarecarissimo counterintuitiveeffectsofitalicqitaliclearningexplorationinacongestiondilemma |