Collaboratively Learning the Best Option on Graphs, Using Bounded Local Memory

<jats:p>We consider multi-armed bandit problems in social groups wherein each individual has bounded memory and shares the common goal of learning the best arm/option. We say an individual learns the best option if eventually (as $t\diverge$) it pulls only the arm with the highest expected rew...

Full description

Bibliographic Details
Main Authors:	Su, Lili, Zubeldia, Martin, Lynch, Nancy
Other Authors:	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Format:	Article
Language:	English
Published:	Association for Computing Machinery (ACM) 2021
Online Access:	https://hdl.handle.net/1721.1/135402

_version_	1826204297197518848
author	Su, Lili Zubeldia, Martin Lynch, Nancy
author2	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
author_facet	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science Su, Lili Zubeldia, Martin Lynch, Nancy
author_sort	Su, Lili
collection	MIT
description	<jats:p>We consider multi-armed bandit problems in social groups wherein each individual has bounded memory and shares the common goal of learning the best arm/option. We say an individual learns the best option if eventually (as $t\diverge$) it pulls only the arm with the highest expected reward. While this goal is provably impossible for an isolated individual due to bounded memory, we show that, in social groups, this goal can be achieved easily with the aid of social persuasion (i.e., communication) as long as the communication networks/graphs satisfy some mild conditions. In this work, we model and analyze a type of learning dynamics which are well-observed in social groups. Specifically, under the learning dynamics of interest, an individual sequentially decides on which arm to pull next based on not only its private reward feedback but also the suggestion provided by a randomly chosen neighbor. To deal with the interplay between the randomness in the rewards and in the social interaction, we employ the \em mean-field approximation method. Considering the possibility that the individuals in the networks may not be exchangeable when the communication networks are not cliques, we go beyond the classic mean-field techniques and apply a refined version of mean-field approximation:</jats:p> <jats:p>Using coupling we show that, if the communication graph is connected and is either regular or has doubly-stochastic degree-weighted adjacency matrix, with probability → 1 as the social group size N → ∞, every individual in the social group learns the best option.</jats:p> <jats:p>If the minimum degree of the graph diverges as N → ∞, over an arbitrary but given finite time horizon, the sample paths describing the opinion evolutions of the individuals are asymptotically independent. In addition, the proportions of the population with different opinions converge to the unique solution of a system of ODEs. Interestingly, the obtained system of ODEs are invariant to the structures of the communication graphs. In the solution of the obtained ODEs, the proportion of the population holding the correct opinion converges to 1 exponentially fast in time.</jats:p> <jats:p>Notably, our results hold even if the communication graphs are highly sparse.</jats:p>
first_indexed	2024-09-23T12:52:03Z
format	Article
id	mit-1721.1/135402
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T12:52:03Z
publishDate	2021
publisher	Association for Computing Machinery (ACM)
record_format	dspace
spelling	mit-1721.1/1354022023-11-08T21:36:09Z Collaboratively Learning the Best Option on Graphs, Using Bounded Local Memory Su, Lili Zubeldia, Martin Lynch, Nancy Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science <jats:p>We consider multi-armed bandit problems in social groups wherein each individual has bounded memory and shares the common goal of learning the best arm/option. We say an individual learns the best option if eventually (as $t\diverge$) it pulls only the arm with the highest expected reward. While this goal is provably impossible for an isolated individual due to bounded memory, we show that, in social groups, this goal can be achieved easily with the aid of social persuasion (i.e., communication) as long as the communication networks/graphs satisfy some mild conditions. In this work, we model and analyze a type of learning dynamics which are well-observed in social groups. Specifically, under the learning dynamics of interest, an individual sequentially decides on which arm to pull next based on not only its private reward feedback but also the suggestion provided by a randomly chosen neighbor. To deal with the interplay between the randomness in the rewards and in the social interaction, we employ the \em mean-field approximation method. Considering the possibility that the individuals in the networks may not be exchangeable when the communication networks are not cliques, we go beyond the classic mean-field techniques and apply a refined version of mean-field approximation:</jats:p> <jats:p>Using coupling we show that, if the communication graph is connected and is either regular or has doubly-stochastic degree-weighted adjacency matrix, with probability → 1 as the social group size N → ∞, every individual in the social group learns the best option.</jats:p> <jats:p>If the minimum degree of the graph diverges as N → ∞, over an arbitrary but given finite time horizon, the sample paths describing the opinion evolutions of the individuals are asymptotically independent. In addition, the proportions of the population with different opinions converge to the unique solution of a system of ODEs. Interestingly, the obtained system of ODEs are invariant to the structures of the communication graphs. In the solution of the obtained ODEs, the proportion of the population holding the correct opinion converges to 1 exponentially fast in time.</jats:p> <jats:p>Notably, our results hold even if the communication graphs are highly sparse.</jats:p> 2021-10-27T20:23:18Z 2021-10-27T20:23:18Z 2019 2021-01-29T13:52:17Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/135402 en 10.1145/3322205.3311082 Proceedings of the ACM on Measurement and Analysis of Computing Systems Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Association for Computing Machinery (ACM) arXiv
spellingShingle	Su, Lili Zubeldia, Martin Lynch, Nancy Collaboratively Learning the Best Option on Graphs, Using Bounded Local Memory
title	Collaboratively Learning the Best Option on Graphs, Using Bounded Local Memory
title_full	Collaboratively Learning the Best Option on Graphs, Using Bounded Local Memory
title_fullStr	Collaboratively Learning the Best Option on Graphs, Using Bounded Local Memory
title_full_unstemmed	Collaboratively Learning the Best Option on Graphs, Using Bounded Local Memory
title_short	Collaboratively Learning the Best Option on Graphs, Using Bounded Local Memory
title_sort	collaboratively learning the best option on graphs using bounded local memory
url	https://hdl.handle.net/1721.1/135402
work_keys_str_mv	AT sulili collaborativelylearningthebestoptionongraphsusingboundedlocalmemory AT zubeldiamartin collaborativelylearningthebestoptionongraphsusingboundedlocalmemory AT lynchnancy collaborativelylearningthebestoptionongraphsusingboundedlocalmemory

Collaboratively Learning the Best Option on Graphs, Using Bounded Local Memory

Similar Items