Optimistic gittins indices

Starting with the Thomspon sampling algorithm, recent years have seen a resurgence of interest in Bayesian algorithms for the Multi-armed Bandit (MAB) problem. These algorithms seek to exploit prior information on arm biases and while several have been shown to be regret optimal, their design has no...

Full description

Bibliographic Details
Main Authors:	Gutin, Eli, Farias, Vivek F.
Other Authors:	Sloan School of Management
Format:	Article
Published:	NIPS Foundation 2020
Online Access:	https://hdl.handle.net/1721.1/128464

_version_	1826201884423094272
author	Gutin, Eli Farias, Vivek F.
author2	Sloan School of Management
author_facet	Sloan School of Management Gutin, Eli Farias, Vivek F.
author_sort	Gutin, Eli
collection	MIT
description	Starting with the Thomspon sampling algorithm, recent years have seen a resurgence of interest in Bayesian algorithms for the Multi-armed Bandit (MAB) problem. These algorithms seek to exploit prior information on arm biases and while several have been shown to be regret optimal, their design has not emerged from a principled approach. In contrast, if one cared about Bayesian regret discounted over an infinite horizon at a fixed, pre-specified rate, the celebrated Gittins index theorem offers an optimal algorithm. Unfortunately, the Gittins analysis does not appear to carry over to minimizing Bayesian regret over all sufficiently large horizons and computing a Gittins index is onerous relative to essentially any incumbent index scheme for the Bayesian MAB problem. The present paper proposes a sequence of 'optimistic' approximations to the Gittins index. We show that the use of these approximations in concert with the use of an increasing discount factor appears to offer a compelling alternative to state-of-the-art index schemes proposed for the Bayesian MAB problem in recent years by offering substantially improved performance with little to no additional computational overhead. In addition, we prove that the simplest of these approximations yields frequentist regret that matches the Lai-Robbins lower bound, including achieving matching constants.
first_indexed	2024-09-23T11:58:44Z
format	Article
id	mit-1721.1/128464
institution	Massachusetts Institute of Technology
last_indexed	2024-09-23T11:58:44Z
publishDate	2020
publisher	NIPS Foundation
record_format	dspace
spelling	mit-1721.1/1284642022-09-27T23:13:25Z Optimistic gittins indices Gutin, Eli Farias, Vivek F. Sloan School of Management Massachusetts Institute of Technology. Operations Research Center Starting with the Thomspon sampling algorithm, recent years have seen a resurgence of interest in Bayesian algorithms for the Multi-armed Bandit (MAB) problem. These algorithms seek to exploit prior information on arm biases and while several have been shown to be regret optimal, their design has not emerged from a principled approach. In contrast, if one cared about Bayesian regret discounted over an infinite horizon at a fixed, pre-specified rate, the celebrated Gittins index theorem offers an optimal algorithm. Unfortunately, the Gittins analysis does not appear to carry over to minimizing Bayesian regret over all sufficiently large horizons and computing a Gittins index is onerous relative to essentially any incumbent index scheme for the Bayesian MAB problem. The present paper proposes a sequence of 'optimistic' approximations to the Gittins index. We show that the use of these approximations in concert with the use of an increasing discount factor appears to offer a compelling alternative to state-of-the-art index schemes proposed for the Bayesian MAB problem in recent years by offering substantially improved performance with little to no additional computational overhead. In addition, we prove that the simplest of these approximations yields frequentist regret that matches the Lai-Robbins lower bound, including achieving matching constants. 2020-11-12T20:39:49Z 2020-11-12T20:39:49Z 2016-12 2019-02-12T15:33:33Z Article http://purl.org/eprint/type/ConferencePaper https://hdl.handle.net/1721.1/128464 Gutin, Eli and Vivek F. Farias. "Optimistic gittins indices." Advances in Neural Information Processing Systems 29 (NIPS 2016), Barcelona, Spain, NIPS Foundation, December 2016. © 2016 NIPS Foundation https://papers.nips.cc/paper/6036-optimistic-gittins-indices Advances in Neural Information Processing Systems 29 (NIPS 2016) Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. application/pdf NIPS Foundation Neural Information Processing Systems (NIPS)
spellingShingle	Gutin, Eli Farias, Vivek F. Optimistic gittins indices
title	Optimistic gittins indices
title_full	Optimistic gittins indices
title_fullStr	Optimistic gittins indices
title_full_unstemmed	Optimistic gittins indices
title_short	Optimistic gittins indices
title_sort	optimistic gittins indices
url	https://hdl.handle.net/1721.1/128464
work_keys_str_mv	AT gutineli optimisticgittinsindices AT fariasvivekf optimisticgittinsindices

Optimistic gittins indices

Similar Items