Learning to Optimize Under Non-Stationarity

© 2019 by the author(s). We introduce algorithms that achieve state-of-the-art dynamic regret bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the non...

Fuld beskrivelse

Bibliografiske detaljer
Main Authors:	Cheung, Wang Chi, Simchi-Levi, David, Zhu, Ruihao
Andre forfattere:	Massachusetts Institute of Technology. Institute for Data, Systems, and Society
Format:	Article
Sprog:	English
Udgivet:	Elsevier BV 2021
Online adgang:	https://hdl.handle.net/1721.1/137064

_version_	1826190567853260800
author	Cheung, Wang Chi Simchi-Levi, David Zhu, Ruihao
author2	Massachusetts Institute of Technology. Institute for Data, Systems, and Society
author_facet	Massachusetts Institute of Technology. Institute for Data, Systems, and Society Cheung, Wang Chi Simchi-Levi, David Zhu, Ruihao
author_sort	Cheung, Wang Chi
collection	MIT
description	© 2019 by the author(s). We introduce algorithms that achieve state-of-the-art dynamic regret bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the non-stationarity can be overcome by a novel marriage between stochastic and adversarial bandits learning algorithms. Defining d, BT, and T as the problem dimension, the variation budget, and the total time horizon, respectively, our main contributions are the tuned Sliding Window UCB (SW-UCB) algorithm with optimal Oe(d2/3(BT + 1)1/3T2/3) dynamic regret, and the tuning free bandit-over-bandit (BOB) framework built on top of the SW-UCB algorithm with best Oe(d2/3(BT + 1)1/4T3/4) dynamic regret.
first_indexed	2024-09-23T08:41:58Z
format	Article
id	mit-1721.1/137064
institution	Massachusetts Institute of Technology
language	English
last_indexed	2024-09-23T08:41:58Z
publishDate	2021
publisher	Elsevier BV
record_format	dspace
spelling	mit-1721.1/1370642023-01-30T21:21:03Z Learning to Optimize Under Non-Stationarity Cheung, Wang Chi Simchi-Levi, David Zhu, Ruihao Massachusetts Institute of Technology. Institute for Data, Systems, and Society Statistics and Data Science Center (Massachusetts Institute of Technology) © 2019 by the author(s). We introduce algorithms that achieve state-of-the-art dynamic regret bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the non-stationarity can be overcome by a novel marriage between stochastic and adversarial bandits learning algorithms. Defining d, BT, and T as the problem dimension, the variation budget, and the total time horizon, respectively, our main contributions are the tuned Sliding Window UCB (SW-UCB) algorithm with optimal Oe(d2/3(BT + 1)1/3T2/3) dynamic regret, and the tuning free bandit-over-bandit (BOB) framework built on top of the SW-UCB algorithm with best Oe(d2/3(BT + 1)1/4T3/4) dynamic regret. 2021-11-02T12:19:41Z 2021-11-02T12:19:41Z 2018 2020-06-02T16:54:50Z Article http://purl.org/eprint/type/ConferencePaper 1556-5068 https://hdl.handle.net/1721.1/137064 Cheung, Wang Chi, Simchi-Levi, David and Zhu, Ruihao. 2018. "Learning to Optimize Under Non-Stationarity." AISTATS 2019 - 22nd International Conference on Artificial Intelligence and Statistics, 89. en 10.2139/ssrn.3261050 AISTATS 2019 - 22nd International Conference on Artificial Intelligence and Statistics Creative Commons Attribution-Noncommercial-Share Alike http://creativecommons.org/licenses/by-nc-sa/4.0/ application/pdf Elsevier BV arXiv
spellingShingle	Cheung, Wang Chi Simchi-Levi, David Zhu, Ruihao Learning to Optimize Under Non-Stationarity
title	Learning to Optimize Under Non-Stationarity
title_full	Learning to Optimize Under Non-Stationarity
title_fullStr	Learning to Optimize Under Non-Stationarity
title_full_unstemmed	Learning to Optimize Under Non-Stationarity
title_short	Learning to Optimize Under Non-Stationarity
title_sort	learning to optimize under non stationarity
url	https://hdl.handle.net/1721.1/137064
work_keys_str_mv	AT cheungwangchi learningtooptimizeundernonstationarity AT simchilevidavid learningtooptimizeundernonstationarity AT zhuruihao learningtooptimizeundernonstationarity

Learning to Optimize Under Non-Stationarity

Lignende værker