Efficient PAC reinforcement learning in regular decision processes

Recently regular decision processes have been proposed as a well-behaved form of non-Markov decision process. Regular decision processes are characterised by a transition function and a reward function that depend on the whole history, though regularly (as in regular languages). In practice both the...

Full description

Bibliographic Details
Main Authors:	Ronca, A, De Giacomo, G
Format:	Conference item
Language:	English
Published:	International Joint Conferences on Artificial Intelligence 2021

_version_	1811141167304146944
author	Ronca, A De Giacomo, G
author_facet	Ronca, A De Giacomo, G
author_sort	Ronca, A
collection	OXFORD
description	Recently regular decision processes have been proposed as a well-behaved form of non-Markov decision process. Regular decision processes are characterised by a transition function and a reward function that depend on the whole history, though regularly (as in regular languages). In practice both the transition and the reward functions can be seen as finite transducers. We study reinforcement learning in regular decision processes. Our main contribution is to show that a near-optimal policy can be PAC-learned in polynomial time in a set of parameters that describe the underlying decision process. We argue that the identified set of parameters is minimal and it reasonably captures the difficulty of a regular decision process.
first_indexed	2024-09-25T04:33:34Z
format	Conference item
id	oxford-uuid:78f99f76-f041-4e57-bc41-6a14f384e8aa
institution	University of Oxford
language	English
last_indexed	2024-09-25T04:33:34Z
publishDate	2021
publisher	International Joint Conferences on Artificial Intelligence
record_format	dspace
spelling	oxford-uuid:78f99f76-f041-4e57-bc41-6a14f384e8aa2024-09-05T14:39:41ZEfficient PAC reinforcement learning in regular decision processesConference itemhttp://purl.org/coar/resource_type/c_5794uuid:78f99f76-f041-4e57-bc41-6a14f384e8aaEnglishSymplectic ElementsInternational Joint Conferences on Artificial Intelligence2021Ronca, ADe Giacomo, GRecently regular decision processes have been proposed as a well-behaved form of non-Markov decision process. Regular decision processes are characterised by a transition function and a reward function that depend on the whole history, though regularly (as in regular languages). In practice both the transition and the reward functions can be seen as finite transducers. We study reinforcement learning in regular decision processes. Our main contribution is to show that a near-optimal policy can be PAC-learned in polynomial time in a set of parameters that describe the underlying decision process. We argue that the identified set of parameters is minimal and it reasonably captures the difficulty of a regular decision process.
spellingShingle	Ronca, A De Giacomo, G Efficient PAC reinforcement learning in regular decision processes
title	Efficient PAC reinforcement learning in regular decision processes
title_full	Efficient PAC reinforcement learning in regular decision processes
title_fullStr	Efficient PAC reinforcement learning in regular decision processes
title_full_unstemmed	Efficient PAC reinforcement learning in regular decision processes
title_short	Efficient PAC reinforcement learning in regular decision processes
title_sort	efficient pac reinforcement learning in regular decision processes
work_keys_str_mv	AT roncaa efficientpacreinforcementlearninginregulardecisionprocesses AT degiacomog efficientpacreinforcementlearninginregulardecisionprocesses

Efficient PAC reinforcement learning in regular decision processes

Similar Items