Efficient PAC reinforcement learning in regular decision processes
Recently regular decision processes have been proposed as a well-behaved form of non-Markov decision process. Regular decision processes are characterised by a transition function and a reward function that depend on the whole history, though regularly (as in regular languages). In practice both the...
Main Authors: | , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
International Joint Conferences on Artificial Intelligence
2021
|
_version_ | 1811141167304146944 |
---|---|
author | Ronca, A De Giacomo, G |
author_facet | Ronca, A De Giacomo, G |
author_sort | Ronca, A |
collection | OXFORD |
description | Recently regular decision processes have been proposed as a well-behaved form of non-Markov decision process. Regular decision processes are characterised by a transition function and a reward function that depend on the whole history, though regularly (as in regular languages). In practice both the transition and the reward functions can be seen as finite transducers. We study reinforcement learning in regular decision processes. Our main contribution is to show that a near-optimal policy can be PAC-learned in polynomial time in a set of parameters that describe the underlying decision process. We argue that the identified set of parameters is minimal and it reasonably captures the difficulty of a regular decision process. |
first_indexed | 2024-09-25T04:33:34Z |
format | Conference item |
id | oxford-uuid:78f99f76-f041-4e57-bc41-6a14f384e8aa |
institution | University of Oxford |
language | English |
last_indexed | 2024-09-25T04:33:34Z |
publishDate | 2021 |
publisher | International Joint Conferences on Artificial Intelligence |
record_format | dspace |
spelling | oxford-uuid:78f99f76-f041-4e57-bc41-6a14f384e8aa2024-09-05T14:39:41ZEfficient PAC reinforcement learning in regular decision processesConference itemhttp://purl.org/coar/resource_type/c_5794uuid:78f99f76-f041-4e57-bc41-6a14f384e8aaEnglishSymplectic ElementsInternational Joint Conferences on Artificial Intelligence2021Ronca, ADe Giacomo, GRecently regular decision processes have been proposed as a well-behaved form of non-Markov decision process. Regular decision processes are characterised by a transition function and a reward function that depend on the whole history, though regularly (as in regular languages). In practice both the transition and the reward functions can be seen as finite transducers. We study reinforcement learning in regular decision processes. Our main contribution is to show that a near-optimal policy can be PAC-learned in polynomial time in a set of parameters that describe the underlying decision process. We argue that the identified set of parameters is minimal and it reasonably captures the difficulty of a regular decision process. |
spellingShingle | Ronca, A De Giacomo, G Efficient PAC reinforcement learning in regular decision processes |
title | Efficient PAC reinforcement learning in regular decision processes |
title_full | Efficient PAC reinforcement learning in regular decision processes |
title_fullStr | Efficient PAC reinforcement learning in regular decision processes |
title_full_unstemmed | Efficient PAC reinforcement learning in regular decision processes |
title_short | Efficient PAC reinforcement learning in regular decision processes |
title_sort | efficient pac reinforcement learning in regular decision processes |
work_keys_str_mv | AT roncaa efficientpacreinforcementlearninginregulardecisionprocesses AT degiacomog efficientpacreinforcementlearninginregulardecisionprocesses |