Batched Bandit Problems

Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives close to minimax optimal regr...

Full description

Bibliographic Details
Main Authors: Perchet, Vianney, Rigollet, Philippe, Chassang, Sylvain, Snowberg, Erik
Other Authors: Massachusetts Institute of Technology. Department of Mathematics
Format: Article
Language:en_US
Published: Institute of Mathematical Statistics 2015
Online Access:http://hdl.handle.net/1721.1/98879
https://orcid.org/0000-0002-0135-7162
_version_ 1826208614410354688
author Perchet, Vianney
Rigollet, Philippe
Chassang, Sylvain
Snowberg, Erik
author2 Massachusetts Institute of Technology. Department of Mathematics
author_facet Massachusetts Institute of Technology. Department of Mathematics
Perchet, Vianney
Rigollet, Philippe
Chassang, Sylvain
Snowberg, Erik
author_sort Perchet, Vianney
collection MIT
description Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.
first_indexed 2024-09-23T14:08:24Z
format Article
id mit-1721.1/98879
institution Massachusetts Institute of Technology
language en_US
last_indexed 2024-09-23T14:08:24Z
publishDate 2015
publisher Institute of Mathematical Statistics
record_format dspace
spelling mit-1721.1/988792022-09-28T18:45:57Z Batched Bandit Problems Perchet, Vianney Rigollet, Philippe Chassang, Sylvain Snowberg, Erik Massachusetts Institute of Technology. Department of Mathematics Rigollet, Philippe Rigollet, Philippe Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits. National Science Foundation (U.S.) (Grant DMS-1317308) National Science Foundation (U.S.) (CAREER-DMS-1053987) Meimaris Family 2015-09-24T12:51:14Z 2015-09-24T12:51:14Z 2015-09-24 Article http://purl.org/eprint/type/JournalArticle 0090-5364 http://hdl.handle.net/1721.1/98879 Perchet, Vianney, Philippe Rigollet, Sylvain Chassang, and Erik Snowberg. "Batched Bandit Problems." Annals of Statistics (2015). https://orcid.org/0000-0002-0135-7162 en_US forthcoming in Annals of Statistics Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Institute of Mathematical Statistics arXiv
spellingShingle Perchet, Vianney
Rigollet, Philippe
Chassang, Sylvain
Snowberg, Erik
Batched Bandit Problems
title Batched Bandit Problems
title_full Batched Bandit Problems
title_fullStr Batched Bandit Problems
title_full_unstemmed Batched Bandit Problems
title_short Batched Bandit Problems
title_sort batched bandit problems
url http://hdl.handle.net/1721.1/98879
https://orcid.org/0000-0002-0135-7162
work_keys_str_mv AT perchetvianney batchedbanditproblems
AT rigolletphilippe batchedbanditproblems
AT chassangsylvain batchedbanditproblems
AT snowbergerik batchedbanditproblems