Provably efficient offline reinforcement learning in regular decision processes

This paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular Decision Processes (RDPs). RDPs are the subclass of Non-Markov Decision Processes where the dependency on the history of past events can be captured by a finite-state automaton. We consider a setting where the a...

Full description

Bibliographic Details
Main Authors:	Cipollone, R, Jonsson, A, Ronca, A, Talebi, MS
Format:	Conference item
Language:	English
Published:	Neural Information Processing Systems Foundation 2024

_version_	1811141125919997952
author	Cipollone, R Jonsson, A Ronca, A Talebi, MS
author_facet	Cipollone, R Jonsson, A Ronca, A Talebi, MS
author_sort	Cipollone, R
collection	OXFORD
description	This paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular Decision Processes (RDPs). RDPs are the subclass of Non-Markov Decision Processes where the dependency on the history of past events can be captured by a finite-state automaton. We consider a setting where the automaton that underlies the RDP is unknown, and a learner strives to learn a near-optimal policy using pre-collected data, in the form of non-Markov sequences of observations, without further exploration. We present RegORL, an algorithm that suitably combines automata learning techniques and state-of-the-art algorithms for offline RL in MDPs. RegORL has a modular design allowing one to use any off-the-shelf offline RL algorithm in MDPs. We report a non-asymptotic high-probability sample complexity bound for RegORL to yield an ε-optimal policy, which makes appear a notion of concentrability relevant for RDPs. Furthermore, we present a sample complexity lower bound for offline RL in RDPs. To our best knowledge, this is the first work presenting a provably efficient algorithm for offline learning in RDPs.
first_indexed	2024-09-25T04:32:55Z
format	Conference item
id	oxford-uuid:14b8e2a1-f24e-4cc5-80ad-db6254094916
institution	University of Oxford
language	English
last_indexed	2024-09-25T04:32:55Z
publishDate	2024
publisher	Neural Information Processing Systems Foundation
record_format	dspace
spelling	oxford-uuid:14b8e2a1-f24e-4cc5-80ad-db62540949162024-09-05T12:44:23ZProvably efficient offline reinforcement learning in regular decision processesConference itemhttp://purl.org/coar/resource_type/c_5794uuid:14b8e2a1-f24e-4cc5-80ad-db6254094916EnglishSymplectic ElementsNeural Information Processing Systems Foundation2024Cipollone, RJonsson, ARonca, ATalebi, MSThis paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular Decision Processes (RDPs). RDPs are the subclass of Non-Markov Decision Processes where the dependency on the history of past events can be captured by a finite-state automaton. We consider a setting where the automaton that underlies the RDP is unknown, and a learner strives to learn a near-optimal policy using pre-collected data, in the form of non-Markov sequences of observations, without further exploration. We present RegORL, an algorithm that suitably combines automata learning techniques and state-of-the-art algorithms for offline RL in MDPs. RegORL has a modular design allowing one to use any off-the-shelf offline RL algorithm in MDPs. We report a non-asymptotic high-probability sample complexity bound for RegORL to yield an ε-optimal policy, which makes appear a notion of concentrability relevant for RDPs. Furthermore, we present a sample complexity lower bound for offline RL in RDPs. To our best knowledge, this is the first work presenting a provably efficient algorithm for offline learning in RDPs.
spellingShingle	Cipollone, R Jonsson, A Ronca, A Talebi, MS Provably efficient offline reinforcement learning in regular decision processes
title	Provably efficient offline reinforcement learning in regular decision processes
title_full	Provably efficient offline reinforcement learning in regular decision processes
title_fullStr	Provably efficient offline reinforcement learning in regular decision processes
title_full_unstemmed	Provably efficient offline reinforcement learning in regular decision processes
title_short	Provably efficient offline reinforcement learning in regular decision processes
title_sort	provably efficient offline reinforcement learning in regular decision processes
work_keys_str_mv	AT cipolloner provablyefficientofflinereinforcementlearninginregulardecisionprocesses AT jonssona provablyefficientofflinereinforcementlearninginregulardecisionprocesses AT roncaa provablyefficientofflinereinforcementlearninginregulardecisionprocesses AT talebims provablyefficientofflinereinforcementlearninginregulardecisionprocesses

Provably efficient offline reinforcement learning in regular decision processes

Similar Items