Optimistic gittins indices
Starting with the Thomspon sampling algorithm, recent years have seen a resurgence of interest in Bayesian algorithms for the Multi-armed Bandit (MAB) problem. These algorithms seek to exploit prior information on arm biases and while several have been shown to be regret optimal, their design has no...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Article |
Published: |
NIPS Foundation
2020
|
Online Access: | https://hdl.handle.net/1721.1/128464 |