Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion Dilemma

Exploration is an integral part of learning dynamics which allows algorithms to search a space of solutions. When many algorithms simultaneously explore, this can lead to counter-intuitive effects. This paper contributes an analysis of the influence that exploration has on a multi-agent system of &l...

Full description

Bibliographic Details
Main Author: Cesare Carissimo
Format: Article
Language:English
Published: IEEE 2024-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/10414037/
_version_ 1797330328030806016
author Cesare Carissimo
author_facet Cesare Carissimo
author_sort Cesare Carissimo
collection DOAJ
description Exploration is an integral part of learning dynamics which allows algorithms to search a space of solutions. When many algorithms simultaneously explore, this can lead to counter-intuitive effects. This paper contributes an analysis of the influence that exploration has on a multi-agent system of <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learners in a famous congestion dilemma, the Braess paradox. I find ranges of the exploration rate for which <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>-greedy <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learners show chaotic and oscillatory dynamics which do not converge, and yield better than Nash equilibrium results. I decouple the dynamics endogenous to <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learning from the exogenous exploration rate <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>, and find that <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learners implicitly coordinate with low exploration rates <inline-formula> <tex-math notation="LaTeX">$\epsilon \in (0, 0.1)$ </tex-math></inline-formula>, but are disrupted in their coordination for larger exploration rates <inline-formula> <tex-math notation="LaTeX">$\epsilon &gt; 0.1$ </tex-math></inline-formula>. The best implicit coordination leads to a 20&#x0025; reduction in average travel times which approaches the social optimum. I discuss how our results may inform multi-agent algorithm design, fit within a cognitive science perspective of cognitive noise during learning, and provide a mechanistic hypothesis for the lack of empirical evidence of the Braess Paradox in traffic systems.
first_indexed 2024-03-08T07:18:18Z
format Article
id doaj.art-3d8d7052e8434d11b013ad6d66e82fd7
institution Directory Open Access Journal
issn 2169-3536
language English
last_indexed 2024-03-08T07:18:18Z
publishDate 2024-01-01
publisher IEEE
record_format Article
series IEEE Access
spelling doaj.art-3d8d7052e8434d11b013ad6d66e82fd72024-02-03T00:02:26ZengIEEEIEEE Access2169-35362024-01-0112159841599610.1109/ACCESS.2024.335860810414037Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion DilemmaCesare Carissimo0https://orcid.org/0000-0002-4383-7279Computational Social Science Group, ETH Z&#x00FC;rich, Z&#x00FC;rich, SwitzerlandExploration is an integral part of learning dynamics which allows algorithms to search a space of solutions. When many algorithms simultaneously explore, this can lead to counter-intuitive effects. This paper contributes an analysis of the influence that exploration has on a multi-agent system of <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learners in a famous congestion dilemma, the Braess paradox. I find ranges of the exploration rate for which <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>-greedy <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learners show chaotic and oscillatory dynamics which do not converge, and yield better than Nash equilibrium results. I decouple the dynamics endogenous to <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learning from the exogenous exploration rate <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>, and find that <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learners implicitly coordinate with low exploration rates <inline-formula> <tex-math notation="LaTeX">$\epsilon \in (0, 0.1)$ </tex-math></inline-formula>, but are disrupted in their coordination for larger exploration rates <inline-formula> <tex-math notation="LaTeX">$\epsilon &gt; 0.1$ </tex-math></inline-formula>. The best implicit coordination leads to a 20&#x0025; reduction in average travel times which approaches the social optimum. I discuss how our results may inform multi-agent algorithm design, fit within a cognitive science perspective of cognitive noise during learning, and provide a mechanistic hypothesis for the lack of empirical evidence of the Braess Paradox in traffic systems.https://ieeexplore.ieee.org/document/10414037/Braess paradoxchaoscongestion gameslearning dynamicsreinforcement learningQ-learning
spellingShingle Cesare Carissimo
Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion Dilemma
IEEE Access
Braess paradox
chaos
congestion games
learning dynamics
reinforcement learning
Q-learning
title Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion Dilemma
title_full Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion Dilemma
title_fullStr Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion Dilemma
title_full_unstemmed Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion Dilemma
title_short Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion Dilemma
title_sort counter intuitive effects of italic q italic learning exploration in a congestion dilemma
topic Braess paradox
chaos
congestion games
learning dynamics
reinforcement learning
Q-learning
url https://ieeexplore.ieee.org/document/10414037/
work_keys_str_mv AT cesarecarissimo counterintuitiveeffectsofitalicqitaliclearningexplorationinacongestiondilemma