Provably efficient offline reinforcement learning in regular decision processes
This paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular Decision Processes (RDPs). RDPs are the subclass of Non-Markov Decision Processes where the dependency on the history of past events can be captured by a finite-state automaton. We consider a setting where the a...
Main Authors: | , , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
Neural Information Processing Systems Foundation
2024
|
_version_ | 1811141125919997952 |
---|---|
author | Cipollone, R Jonsson, A Ronca, A Talebi, MS |
author_facet | Cipollone, R Jonsson, A Ronca, A Talebi, MS |
author_sort | Cipollone, R |
collection | OXFORD |
description | This paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular Decision Processes (RDPs). RDPs are the subclass of Non-Markov Decision Processes where the dependency on the history of past events can be captured by a finite-state automaton. We consider a setting where the automaton that underlies the RDP is unknown, and a learner strives to learn a near-optimal policy using pre-collected data, in the form of non-Markov sequences of observations, without further exploration. We present RegORL, an algorithm that suitably combines automata learning techniques and state-of-the-art algorithms for offline RL in MDPs. RegORL has a modular design allowing one to use any off-the-shelf offline RL algorithm in MDPs. We report a non-asymptotic high-probability sample complexity bound for RegORL to yield an ε-optimal policy, which makes appear a notion of concentrability relevant for RDPs. Furthermore, we present a sample complexity lower bound for offline RL in RDPs. To our best knowledge, this is the first work presenting a provably efficient algorithm for offline learning in RDPs. |
first_indexed | 2024-09-25T04:32:55Z |
format | Conference item |
id | oxford-uuid:14b8e2a1-f24e-4cc5-80ad-db6254094916 |
institution | University of Oxford |
language | English |
last_indexed | 2024-09-25T04:32:55Z |
publishDate | 2024 |
publisher | Neural Information Processing Systems Foundation |
record_format | dspace |
spelling | oxford-uuid:14b8e2a1-f24e-4cc5-80ad-db62540949162024-09-05T12:44:23ZProvably efficient offline reinforcement learning in regular decision processesConference itemhttp://purl.org/coar/resource_type/c_5794uuid:14b8e2a1-f24e-4cc5-80ad-db6254094916EnglishSymplectic ElementsNeural Information Processing Systems Foundation2024Cipollone, RJonsson, ARonca, ATalebi, MSThis paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular Decision Processes (RDPs). RDPs are the subclass of Non-Markov Decision Processes where the dependency on the history of past events can be captured by a finite-state automaton. We consider a setting where the automaton that underlies the RDP is unknown, and a learner strives to learn a near-optimal policy using pre-collected data, in the form of non-Markov sequences of observations, without further exploration. We present RegORL, an algorithm that suitably combines automata learning techniques and state-of-the-art algorithms for offline RL in MDPs. RegORL has a modular design allowing one to use any off-the-shelf offline RL algorithm in MDPs. We report a non-asymptotic high-probability sample complexity bound for RegORL to yield an ε-optimal policy, which makes appear a notion of concentrability relevant for RDPs. Furthermore, we present a sample complexity lower bound for offline RL in RDPs. To our best knowledge, this is the first work presenting a provably efficient algorithm for offline learning in RDPs. |
spellingShingle | Cipollone, R Jonsson, A Ronca, A Talebi, MS Provably efficient offline reinforcement learning in regular decision processes |
title | Provably efficient offline reinforcement learning in regular decision processes |
title_full | Provably efficient offline reinforcement learning in regular decision processes |
title_fullStr | Provably efficient offline reinforcement learning in regular decision processes |
title_full_unstemmed | Provably efficient offline reinforcement learning in regular decision processes |
title_short | Provably efficient offline reinforcement learning in regular decision processes |
title_sort | provably efficient offline reinforcement learning in regular decision processes |
work_keys_str_mv | AT cipolloner provablyefficientofflinereinforcementlearninginregulardecisionprocesses AT jonssona provablyefficientofflinereinforcementlearninginregulardecisionprocesses AT roncaa provablyefficientofflinereinforcementlearninginregulardecisionprocesses AT talebims provablyefficientofflinereinforcementlearninginregulardecisionprocesses |