Reinforcement Learning for Non-Stationary Markov Decision Processes: The Blessing of (More) Optimism

We consider un-discounted reinforcement learning (RL) in Markov decision processes (MDPs) under drifting non-stationarity, i.e., both the reward and state transition distributions are allowed to evolve over time, as long as their respective total variations, quantified by suitable metrics, do no...

Full description

Bibliographic Details
Main Authors: Cheung, Wang Chi, Simchi-Levi, David, Zhu, Ruihao
Other Authors: Massachusetts Institute of Technology. Institute for Data, Systems, and Society
Format: Article
Language:English
Published: 2021
Online Access:https://hdl.handle.net/1721.1/137255