Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees

We present a model-free reinforcement learning algorithm to synthesize control policies that maximize the probability of satisfying high-level control objectives given as Linear Temporal Logic (LTL) formulas. Uncertainty is considered in the workspace properties, the structure of the workspace, and...

Full description

Bibliographic Details
Main Authors: Hasanbeig, M, Kantaros, Y, Abate, A, Kroening, D, Pappas, G, Lee, I
Format: Conference item
Language:English
Published: IEEE 2020
_version_ 1826272610196914176
author Hasanbeig, M
Kantaros, Y
Abate, A
Kroening, D
Pappas, G
Lee, I
author_facet Hasanbeig, M
Kantaros, Y
Abate, A
Kroening, D
Pappas, G
Lee, I
author_sort Hasanbeig, M
collection OXFORD
description We present a model-free reinforcement learning algorithm to synthesize control policies that maximize the probability of satisfying high-level control objectives given as Linear Temporal Logic (LTL) formulas. Uncertainty is considered in the workspace properties, the structure of the workspace, and the agent actions, giving rise to a Probabilistically-Labeled Markov Decision Process (PL-MDP) with unknown graph structure and stochastic behaviour, which is even more general than a fully unknown MDP. We first translate the LTL specification into a Limit Deterministic Büchi Automaton (LDBA), which is then used in an on-the-fly product with the PL-MDP. Thereafter, we define a synchronous reward function based on the acceptance condition of the LDBA. Finally, we show that the RL algorithm delivers a policy that maximizes the satisfaction probability asymptotically. We provide experimental results that showcase the efficiency of the proposed method.
first_indexed 2024-03-06T22:15:18Z
format Conference item
id oxford-uuid:53301059-d6f2-49fe-9dd1-9b1b9aac944e
institution University of Oxford
language English
last_indexed 2024-03-06T22:15:18Z
publishDate 2020
publisher IEEE
record_format dspace
spelling oxford-uuid:53301059-d6f2-49fe-9dd1-9b1b9aac944e2022-03-26T16:30:04ZReinforcement learning for temporal logic control synthesis with probabilistic satisfaction guaranteesConference itemhttp://purl.org/coar/resource_type/c_5794uuid:53301059-d6f2-49fe-9dd1-9b1b9aac944eEnglishSymplectic Elements at OxfordIEEE2020Hasanbeig, MKantaros, YAbate, AKroening, DPappas, GLee, IWe present a model-free reinforcement learning algorithm to synthesize control policies that maximize the probability of satisfying high-level control objectives given as Linear Temporal Logic (LTL) formulas. Uncertainty is considered in the workspace properties, the structure of the workspace, and the agent actions, giving rise to a Probabilistically-Labeled Markov Decision Process (PL-MDP) with unknown graph structure and stochastic behaviour, which is even more general than a fully unknown MDP. We first translate the LTL specification into a Limit Deterministic Büchi Automaton (LDBA), which is then used in an on-the-fly product with the PL-MDP. Thereafter, we define a synchronous reward function based on the acceptance condition of the LDBA. Finally, we show that the RL algorithm delivers a policy that maximizes the satisfaction probability asymptotically. We provide experimental results that showcase the efficiency of the proposed method.
spellingShingle Hasanbeig, M
Kantaros, Y
Abate, A
Kroening, D
Pappas, G
Lee, I
Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees
title Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees
title_full Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees
title_fullStr Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees
title_full_unstemmed Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees
title_short Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees
title_sort reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees
work_keys_str_mv AT hasanbeigm reinforcementlearningfortemporallogiccontrolsynthesiswithprobabilisticsatisfactionguarantees
AT kantarosy reinforcementlearningfortemporallogiccontrolsynthesiswithprobabilisticsatisfactionguarantees
AT abatea reinforcementlearningfortemporallogiccontrolsynthesiswithprobabilisticsatisfactionguarantees
AT kroeningd reinforcementlearningfortemporallogiccontrolsynthesiswithprobabilisticsatisfactionguarantees
AT pappasg reinforcementlearningfortemporallogiccontrolsynthesiswithprobabilisticsatisfactionguarantees
AT leei reinforcementlearningfortemporallogiccontrolsynthesiswithprobabilisticsatisfactionguarantees