Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion Dilemma

Exploration is an integral part of learning dynamics which allows algorithms to search a space of solutions. When many algorithms simultaneously explore, this can lead to counter-intuitive effects. This paper contributes an analysis of the influence that exploration has on a multi-agent system of &l...

Full description

Bibliographic Details
Main Author:	Cesare Carissimo
Format:	Article
Language:	English
Published:	IEEE 2024-01-01
Series:	IEEE Access
Subjects:	Braess paradox chaos congestion games learning dynamics reinforcement learning Q-learning
Online Access:	https://ieeexplore.ieee.org/document/10414037/

_version_	1797330328030806016
author	Cesare Carissimo
author_facet	Cesare Carissimo
author_sort	Cesare Carissimo
collection	DOAJ
description	Exploration is an integral part of learning dynamics which allows algorithms to search a space of solutions. When many algorithms simultaneously explore, this can lead to counter-intuitive effects. This paper contributes an analysis of the influence that exploration has on a multi-agent system of <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learners in a famous congestion dilemma, the Braess paradox. I find ranges of the exploration rate for which <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>-greedy <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learners show chaotic and oscillatory dynamics which do not converge, and yield better than Nash equilibrium results. I decouple the dynamics endogenous to <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learning from the exogenous exploration rate <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>, and find that <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learners implicitly coordinate with low exploration rates <inline-formula> <tex-math notation="LaTeX">$\epsilon \in (0, 0.1)$ </tex-math></inline-formula>, but are disrupted in their coordination for larger exploration rates <inline-formula> <tex-math notation="LaTeX">$\epsilon > 0.1$ </tex-math></inline-formula>. The best implicit coordination leads to a 20% reduction in average travel times which approaches the social optimum. I discuss how our results may inform multi-agent algorithm design, fit within a cognitive science perspective of cognitive noise during learning, and provide a mechanistic hypothesis for the lack of empirical evidence of the Braess Paradox in traffic systems.
first_indexed	2024-03-08T07:18:18Z
format	Article
id	doaj.art-3d8d7052e8434d11b013ad6d66e82fd7
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-08T07:18:18Z
publishDate	2024-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-3d8d7052e8434d11b013ad6d66e82fd72024-02-03T00:02:26ZengIEEEIEEE Access2169-35362024-01-0112159841599610.1109/ACCESS.2024.335860810414037Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion DilemmaCesare Carissimo0https://orcid.org/0000-0002-4383-7279Computational Social Science Group, ETH Zürich, Zürich, SwitzerlandExploration is an integral part of learning dynamics which allows algorithms to search a space of solutions. When many algorithms simultaneously explore, this can lead to counter-intuitive effects. This paper contributes an analysis of the influence that exploration has on a multi-agent system of <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learners in a famous congestion dilemma, the Braess paradox. I find ranges of the exploration rate for which <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>-greedy <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learners show chaotic and oscillatory dynamics which do not converge, and yield better than Nash equilibrium results. I decouple the dynamics endogenous to <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learning from the exogenous exploration rate <inline-formula> <tex-math notation="LaTeX">$\epsilon $ </tex-math></inline-formula>, and find that <inline-formula> <tex-math notation="LaTeX">$Q$ </tex-math></inline-formula>-learners implicitly coordinate with low exploration rates <inline-formula> <tex-math notation="LaTeX">$\epsilon \in (0, 0.1)$ </tex-math></inline-formula>, but are disrupted in their coordination for larger exploration rates <inline-formula> <tex-math notation="LaTeX">$\epsilon > 0.1$ </tex-math></inline-formula>. The best implicit coordination leads to a 20% reduction in average travel times which approaches the social optimum. I discuss how our results may inform multi-agent algorithm design, fit within a cognitive science perspective of cognitive noise during learning, and provide a mechanistic hypothesis for the lack of empirical evidence of the Braess Paradox in traffic systems.https://ieeexplore.ieee.org/document/10414037/Braess paradoxchaoscongestion gameslearning dynamicsreinforcement learningQ-learning
spellingShingle	Cesare Carissimo Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion Dilemma IEEE Access Braess paradox chaos congestion games learning dynamics reinforcement learning Q-learning
title	Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion Dilemma
title_full	Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion Dilemma
title_fullStr	Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion Dilemma
title_full_unstemmed	Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion Dilemma
title_short	Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion Dilemma
title_sort	counter intuitive effects of italic q italic learning exploration in a congestion dilemma
topic	Braess paradox chaos congestion games learning dynamics reinforcement learning Q-learning
url	https://ieeexplore.ieee.org/document/10414037/
work_keys_str_mv	AT cesarecarissimo counterintuitiveeffectsofitalicqitaliclearningexplorationinacongestiondilemma

Counter-Intuitive Effects of <italic>Q</italic>-Learning Exploration in a Congestion Dilemma

Similar Items