Decentralized cooperative stochastic bandits
We study a decentralized cooperative stochastic multi-armed bandit problem with K arms on a network of N agents. In our model, the reward distribution of each arm is the same for each agent and rewards are drawn independently across agents and time steps. In each round, each agent chooses an arm to...
Main Authors: | , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
Neural Information Processing Systems Foundation
2019
|