Batched Bandit Problems
Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives close to minimax optimal regr...
Main Authors: | , , , |
---|---|
Other Authors: | |
Format: | Article |
Language: | en_US |
Published: |
Institute of Mathematical Statistics
2015
|
Online Access: | http://hdl.handle.net/1721.1/98879 https://orcid.org/0000-0002-0135-7162 |
_version_ | 1826208614410354688 |
---|---|
author | Perchet, Vianney Rigollet, Philippe Chassang, Sylvain Snowberg, Erik |
author2 | Massachusetts Institute of Technology. Department of Mathematics |
author_facet | Massachusetts Institute of Technology. Department of Mathematics Perchet, Vianney Rigollet, Philippe Chassang, Sylvain Snowberg, Erik |
author_sort | Perchet, Vianney |
collection | MIT |
description | Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits. |
first_indexed | 2024-09-23T14:08:24Z |
format | Article |
id | mit-1721.1/98879 |
institution | Massachusetts Institute of Technology |
language | en_US |
last_indexed | 2024-09-23T14:08:24Z |
publishDate | 2015 |
publisher | Institute of Mathematical Statistics |
record_format | dspace |
spelling | mit-1721.1/988792022-09-28T18:45:57Z Batched Bandit Problems Perchet, Vianney Rigollet, Philippe Chassang, Sylvain Snowberg, Erik Massachusetts Institute of Technology. Department of Mathematics Rigollet, Philippe Rigollet, Philippe Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. Our results show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits. National Science Foundation (U.S.) (Grant DMS-1317308) National Science Foundation (U.S.) (CAREER-DMS-1053987) Meimaris Family 2015-09-24T12:51:14Z 2015-09-24T12:51:14Z 2015-09-24 Article http://purl.org/eprint/type/JournalArticle 0090-5364 http://hdl.handle.net/1721.1/98879 Perchet, Vianney, Philippe Rigollet, Sylvain Chassang, and Erik Snowberg. "Batched Bandit Problems." Annals of Statistics (2015). https://orcid.org/0000-0002-0135-7162 en_US forthcoming in Annals of Statistics Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Institute of Mathematical Statistics arXiv |
spellingShingle | Perchet, Vianney Rigollet, Philippe Chassang, Sylvain Snowberg, Erik Batched Bandit Problems |
title | Batched Bandit Problems |
title_full | Batched Bandit Problems |
title_fullStr | Batched Bandit Problems |
title_full_unstemmed | Batched Bandit Problems |
title_short | Batched Bandit Problems |
title_sort | batched bandit problems |
url | http://hdl.handle.net/1721.1/98879 https://orcid.org/0000-0002-0135-7162 |
work_keys_str_mv | AT perchetvianney batchedbanditproblems AT rigolletphilippe batchedbanditproblems AT chassangsylvain batchedbanditproblems AT snowbergerik batchedbanditproblems |