Finite Sample Analysis of Minmax Variant of Offline Reinforcement Learning for General MDPs

In this work, we analyze the finite sample complexity bounds for offline reinforcement learning with general state, general function space and state-dependent action sets. The algorithm analyzed does not require the knowledge of the data-collection policy as compared to earlier works. We show that o...

Täydet tiedot

Bibliografiset tiedot
Päätekijät: Jayanth Reddy Regatti, Abhishek Gupta
Aineistotyyppi: Artikkeli
Kieli:English
Julkaistu: IEEE 2022-01-01
Sarja:IEEE Open Journal of Control Systems
Aiheet:
Linkit:https://ieeexplore.ieee.org/document/9857559/