Learning to Optimize Under Non-Stationarity
© 2019 by the author(s). We introduce algorithms that achieve state-of-the-art dynamic regret bounds for non-stationary linear stochastic bandit setting. It captures natural applications such as dynamic pricing and ads allocation in a changing environment. We show how the difficulty posed by the non...
Main Authors: | Cheung, Wang Chi, Simchi-Levi, David, Zhu, Ruihao |
---|---|
Other Authors: | Massachusetts Institute of Technology. Institute for Data, Systems, and Society |
Format: | Article |
Language: | English |
Published: |
Elsevier BV
2021
|
Online Access: | https://hdl.handle.net/1721.1/137064 |
Similar Items
-
Hedging the Drift: Learning to Optimize Under Nonstationarity
by: Cheung, Wang Chi, et al.
Published: (2023) -
Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism
by: Cheung, Wang Chi, et al.
Published: (2021) -
Near-Optimal Model-Free Reinforcement Learning in Non-Stationary Episodic MDPs
by: Mao, Weichao, et al.
Published: (2023) -
Meta Dynamic Pricing: Transfer Learning Across Experiments
by: Bastani, Hamsa, et al.
Published: (2023) -
Transient non−stationarity and generalisation in deep reinforcement learning
by: Igl, M, et al.
Published: (2021)