Solving multi-armed bandit problems using a chaotic microresonator comb

The Multi-Armed Bandit (MAB) problem, foundational to reinforcement learning-based decision-making, addresses the challenge of maximizing rewards amid multiple uncertain choices. While algorithmic solutions are effective, their computational efficiency diminishes with increasing problem complexity....

Full description

Bibliographic Details
Main Authors:	Jonathan Cuevas, Ryugo Iwami, Atsushi Uchida, Kaoru Minoshima, Naoya Kuse
Format:	Article
Language:	English
Published:	AIP Publishing LLC 2024-03-01
Series:	APL Photonics
Online Access:	http://dx.doi.org/10.1063/5.0173287

_version_	1827297341776855040
author	Jonathan Cuevas Ryugo Iwami Atsushi Uchida Kaoru Minoshima Naoya Kuse
author_facet	Jonathan Cuevas Ryugo Iwami Atsushi Uchida Kaoru Minoshima Naoya Kuse
author_sort	Jonathan Cuevas
collection	DOAJ
description	The Multi-Armed Bandit (MAB) problem, foundational to reinforcement learning-based decision-making, addresses the challenge of maximizing rewards amid multiple uncertain choices. While algorithmic solutions are effective, their computational efficiency diminishes with increasing problem complexity. Photonic accelerators, leveraging temporal and spatial-temporal chaos, have emerged as promising alternatives. However, despite these advancements, current approaches either compromise computation speed or amplify system complexity. In this paper, we introduce a chaotic microresonator frequency comb (chaotic comb) to tackle the MAB problem, where each comb mode is assigned to a slot machine. Through a proof-of-concept experiment, we employ 44 comb modes to address an MAB with 44 slot machines, demonstrating performance competitive with both conventional software algorithms and other photonic methods. Furthermore, the scalability of decision making is explored with up to 512 slot machines using experimentally obtained temporal chaos in different time slots. Power-law scalability is achieved with an exponent of 0.96, outperforming conventional software-based algorithms. Moreover, we find that a numerically calculated chaotic comb accurately reproduces experimental results, paving the way for discussions on strategies to increase the number of slot machines.
first_indexed	2024-04-24T14:54:55Z
format	Article
id	doaj.art-c48f5ef70a4940f49a9cef0dacb63dc5
institution	Directory Open Access Journal
issn	2378-0967
language	English
last_indexed	2024-04-24T14:54:55Z
publishDate	2024-03-01
publisher	AIP Publishing LLC
record_format	Article
series	APL Photonics
spelling	doaj.art-c48f5ef70a4940f49a9cef0dacb63dc52024-04-02T19:30:50ZengAIP Publishing LLCAPL Photonics2378-09672024-03-0193036112036112-1010.1063/5.0173287Solving multi-armed bandit problems using a chaotic microresonator combJonathan Cuevas0Ryugo Iwami1Atsushi Uchida2Kaoru Minoshima3Naoya Kuse4Graduate School of Sciences and Technology for Innovation, Tokushima University, 2-1, Minami-Josanjima, Tokushima 770-8506, JapanDepartment of Information and Computer Sciences, Saitama University, 255 Shimo-okubo, Sakura-ku, Saitama 338-8570, JapanDepartment of Information and Computer Sciences, Saitama University, 255 Shimo-okubo, Sakura-ku, Saitama 338-8570, JapanGraduate School of Informatics and Engineering, The University of Electro-Communications, 1-5-1 Chofugaoka, Chofu, Tokyo 182-8585, JapanInstitute of Post-LED Photonics, Tokushima University, 2-1, Minami-Josanjima, Tokushima 770-8506, JapanThe Multi-Armed Bandit (MAB) problem, foundational to reinforcement learning-based decision-making, addresses the challenge of maximizing rewards amid multiple uncertain choices. While algorithmic solutions are effective, their computational efficiency diminishes with increasing problem complexity. Photonic accelerators, leveraging temporal and spatial-temporal chaos, have emerged as promising alternatives. However, despite these advancements, current approaches either compromise computation speed or amplify system complexity. In this paper, we introduce a chaotic microresonator frequency comb (chaotic comb) to tackle the MAB problem, where each comb mode is assigned to a slot machine. Through a proof-of-concept experiment, we employ 44 comb modes to address an MAB with 44 slot machines, demonstrating performance competitive with both conventional software algorithms and other photonic methods. Furthermore, the scalability of decision making is explored with up to 512 slot machines using experimentally obtained temporal chaos in different time slots. Power-law scalability is achieved with an exponent of 0.96, outperforming conventional software-based algorithms. Moreover, we find that a numerically calculated chaotic comb accurately reproduces experimental results, paving the way for discussions on strategies to increase the number of slot machines.http://dx.doi.org/10.1063/5.0173287
spellingShingle	Jonathan Cuevas Ryugo Iwami Atsushi Uchida Kaoru Minoshima Naoya Kuse Solving multi-armed bandit problems using a chaotic microresonator comb APL Photonics
title	Solving multi-armed bandit problems using a chaotic microresonator comb
title_full	Solving multi-armed bandit problems using a chaotic microresonator comb
title_fullStr	Solving multi-armed bandit problems using a chaotic microresonator comb
title_full_unstemmed	Solving multi-armed bandit problems using a chaotic microresonator comb
title_short	Solving multi-armed bandit problems using a chaotic microresonator comb
title_sort	solving multi armed bandit problems using a chaotic microresonator comb
url	http://dx.doi.org/10.1063/5.0173287
work_keys_str_mv	AT jonathancuevas solvingmultiarmedbanditproblemsusingachaoticmicroresonatorcomb AT ryugoiwami solvingmultiarmedbanditproblemsusingachaoticmicroresonatorcomb AT atsushiuchida solvingmultiarmedbanditproblemsusingachaoticmicroresonatorcomb AT kaoruminoshima solvingmultiarmedbanditproblemsusingachaoticmicroresonatorcomb AT naoyakuse solvingmultiarmedbanditproblemsusingachaoticmicroresonatorcomb

Solving multi-armed bandit problems using a chaotic microresonator comb

Similar Items