Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism
We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity, i.e., both the reward and state transition distributions are allowed to evolve over time, as long as their respective total variations, quantified by suitable metrics, do no...
Main Authors: | Cheung, Wang Chi, Simchi-Levi, David, Zhu, Ruihao |
---|---|
Other Authors: | Massachusetts Institute of Technology. Institute for Data, Systems, and Society |
Format: | Article |
Language: | English |
Published: |
2021
|
Online Access: | https://hdl.handle.net/1721.1/137255 |
Similar Items
-
Near-Optimal Model-Free Reinforcement Learning in Non-Stationary Episodic MDPs
by: Mao, Weichao, et al.
Published: (2023) -
Learning to Optimize Under Non-Stationarity
by: Cheung, Wang Chi, et al.
Published: (2021) -
Hedging the Drift: Learning to Optimize Under Nonstationarity
by: Cheung, Wang Chi, et al.
Published: (2023) -
Markov abstractions for PAC reinforcement learning in non-Markov decision processes
by: Ronca, A, et al.
Published: (2022) -
Polynomial time algorithms for finite horizon, stationary Markov decision processes
Published: (2003)