Provably efficient offline reinforcement learning in regular decision processes

This paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular Decision Processes (RDPs). RDPs are the subclass of Non-Markov Decision Processes where the dependency on the history of past events can be captured by a finite-state automaton. We consider a setting where the a...

Full description

Bibliographic Details
Main Authors: Cipollone, R, Jonsson, A, Ronca, A, Talebi, MS
Format: Conference item
Language:English
Published: Neural Information Processing Systems Foundation 2024
_version_ 1811141125919997952
author Cipollone, R
Jonsson, A
Ronca, A
Talebi, MS
author_facet Cipollone, R
Jonsson, A
Ronca, A
Talebi, MS
author_sort Cipollone, R
collection OXFORD
description This paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular Decision Processes (RDPs). RDPs are the subclass of Non-Markov Decision Processes where the dependency on the history of past events can be captured by a finite-state automaton. We consider a setting where the automaton that underlies the RDP is unknown, and a learner strives to learn a near-optimal policy using pre-collected data, in the form of non-Markov sequences of observations, without further exploration. We present RegORL, an algorithm that suitably combines automata learning techniques and state-of-the-art algorithms for offline RL in MDPs. RegORL has a modular design allowing one to use any off-the-shelf offline RL algorithm in MDPs. We report a non-asymptotic high-probability sample complexity bound for RegORL to yield an ε-optimal policy, which makes appear a notion of concentrability relevant for RDPs. Furthermore, we present a sample complexity lower bound for offline RL in RDPs. To our best knowledge, this is the first work presenting a provably efficient algorithm for offline learning in RDPs.
first_indexed 2024-09-25T04:32:55Z
format Conference item
id oxford-uuid:14b8e2a1-f24e-4cc5-80ad-db6254094916
institution University of Oxford
language English
last_indexed 2024-09-25T04:32:55Z
publishDate 2024
publisher Neural Information Processing Systems Foundation
record_format dspace
spelling oxford-uuid:14b8e2a1-f24e-4cc5-80ad-db62540949162024-09-05T12:44:23ZProvably efficient offline reinforcement learning in regular decision processesConference itemhttp://purl.org/coar/resource_type/c_5794uuid:14b8e2a1-f24e-4cc5-80ad-db6254094916EnglishSymplectic ElementsNeural Information Processing Systems Foundation2024Cipollone, RJonsson, ARonca, ATalebi, MSThis paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular Decision Processes (RDPs). RDPs are the subclass of Non-Markov Decision Processes where the dependency on the history of past events can be captured by a finite-state automaton. We consider a setting where the automaton that underlies the RDP is unknown, and a learner strives to learn a near-optimal policy using pre-collected data, in the form of non-Markov sequences of observations, without further exploration. We present RegORL, an algorithm that suitably combines automata learning techniques and state-of-the-art algorithms for offline RL in MDPs. RegORL has a modular design allowing one to use any off-the-shelf offline RL algorithm in MDPs. We report a non-asymptotic high-probability sample complexity bound for RegORL to yield an ε-optimal policy, which makes appear a notion of concentrability relevant for RDPs. Furthermore, we present a sample complexity lower bound for offline RL in RDPs. To our best knowledge, this is the first work presenting a provably efficient algorithm for offline learning in RDPs.
spellingShingle Cipollone, R
Jonsson, A
Ronca, A
Talebi, MS
Provably efficient offline reinforcement learning in regular decision processes
title Provably efficient offline reinforcement learning in regular decision processes
title_full Provably efficient offline reinforcement learning in regular decision processes
title_fullStr Provably efficient offline reinforcement learning in regular decision processes
title_full_unstemmed Provably efficient offline reinforcement learning in regular decision processes
title_short Provably efficient offline reinforcement learning in regular decision processes
title_sort provably efficient offline reinforcement learning in regular decision processes
work_keys_str_mv AT cipolloner provablyefficientofflinereinforcementlearninginregulardecisionprocesses
AT jonssona provablyefficientofflinereinforcementlearninginregulardecisionprocesses
AT roncaa provablyefficientofflinereinforcementlearninginregulardecisionprocesses
AT talebims provablyefficientofflinereinforcementlearninginregulardecisionprocesses