Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees
We present a model-free reinforcement learning algorithm to synthesize control policies that maximize the probability of satisfying high-level control objectives given as Linear Temporal Logic (LTL) formulas. Uncertainty is considered in the workspace properties, the structure of the workspace, and...
Main Authors: | , , , , , |
---|---|
Format: | Conference item |
Language: | English |
Published: |
IEEE
2020
|
_version_ | 1826272610196914176 |
---|---|
author | Hasanbeig, M Kantaros, Y Abate, A Kroening, D Pappas, G Lee, I |
author_facet | Hasanbeig, M Kantaros, Y Abate, A Kroening, D Pappas, G Lee, I |
author_sort | Hasanbeig, M |
collection | OXFORD |
description | We present a model-free reinforcement learning algorithm to synthesize control policies that maximize the probability of satisfying high-level control objectives given as Linear Temporal Logic (LTL) formulas. Uncertainty is considered in the workspace properties, the structure of the workspace, and the agent actions, giving rise to a Probabilistically-Labeled Markov Decision Process (PL-MDP) with unknown graph structure and stochastic behaviour, which is even more general than a fully unknown MDP. We first translate the LTL specification into a Limit Deterministic Büchi Automaton (LDBA), which is then used in an on-the-fly product with the PL-MDP. Thereafter, we define a synchronous reward function based on the acceptance condition of the LDBA. Finally, we show that the RL algorithm delivers a policy that maximizes the satisfaction probability asymptotically. We provide experimental results that showcase the efficiency of the proposed method. |
first_indexed | 2024-03-06T22:15:18Z |
format | Conference item |
id | oxford-uuid:53301059-d6f2-49fe-9dd1-9b1b9aac944e |
institution | University of Oxford |
language | English |
last_indexed | 2024-03-06T22:15:18Z |
publishDate | 2020 |
publisher | IEEE |
record_format | dspace |
spelling | oxford-uuid:53301059-d6f2-49fe-9dd1-9b1b9aac944e2022-03-26T16:30:04ZReinforcement learning for temporal logic control synthesis with probabilistic satisfaction guaranteesConference itemhttp://purl.org/coar/resource_type/c_5794uuid:53301059-d6f2-49fe-9dd1-9b1b9aac944eEnglishSymplectic Elements at OxfordIEEE2020Hasanbeig, MKantaros, YAbate, AKroening, DPappas, GLee, IWe present a model-free reinforcement learning algorithm to synthesize control policies that maximize the probability of satisfying high-level control objectives given as Linear Temporal Logic (LTL) formulas. Uncertainty is considered in the workspace properties, the structure of the workspace, and the agent actions, giving rise to a Probabilistically-Labeled Markov Decision Process (PL-MDP) with unknown graph structure and stochastic behaviour, which is even more general than a fully unknown MDP. We first translate the LTL specification into a Limit Deterministic Büchi Automaton (LDBA), which is then used in an on-the-fly product with the PL-MDP. Thereafter, we define a synchronous reward function based on the acceptance condition of the LDBA. Finally, we show that the RL algorithm delivers a policy that maximizes the satisfaction probability asymptotically. We provide experimental results that showcase the efficiency of the proposed method. |
spellingShingle | Hasanbeig, M Kantaros, Y Abate, A Kroening, D Pappas, G Lee, I Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees |
title | Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees |
title_full | Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees |
title_fullStr | Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees |
title_full_unstemmed | Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees |
title_short | Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees |
title_sort | reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees |
work_keys_str_mv | AT hasanbeigm reinforcementlearningfortemporallogiccontrolsynthesiswithprobabilisticsatisfactionguarantees AT kantarosy reinforcementlearningfortemporallogiccontrolsynthesiswithprobabilisticsatisfactionguarantees AT abatea reinforcementlearningfortemporallogiccontrolsynthesiswithprobabilisticsatisfactionguarantees AT kroeningd reinforcementlearningfortemporallogiccontrolsynthesiswithprobabilisticsatisfactionguarantees AT pappasg reinforcementlearningfortemporallogiccontrolsynthesiswithprobabilisticsatisfactionguarantees AT leei reinforcementlearningfortemporallogiccontrolsynthesiswithprobabilisticsatisfactionguarantees |