Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees

We present a model-free reinforcement learning algorithm to synthesize control policies that maximize the probability of satisfying high-level control objectives given as Linear Temporal Logic (LTL) formulas. Uncertainty is considered in the workspace properties, the structure of the workspace, and...

Full description

Bibliographic Details
Main Authors:	Hasanbeig, M, Kantaros, Y, Abate, A, Kroening, D, Pappas, G, Lee, I
Format:	Conference item
Language:	English
Published:	IEEE 2020

_version_	1826272610196914176
author	Hasanbeig, M Kantaros, Y Abate, A Kroening, D Pappas, G Lee, I
author_facet	Hasanbeig, M Kantaros, Y Abate, A Kroening, D Pappas, G Lee, I
author_sort	Hasanbeig, M
collection	OXFORD
description	We present a model-free reinforcement learning algorithm to synthesize control policies that maximize the probability of satisfying high-level control objectives given as Linear Temporal Logic (LTL) formulas. Uncertainty is considered in the workspace properties, the structure of the workspace, and the agent actions, giving rise to a Probabilistically-Labeled Markov Decision Process (PL-MDP) with unknown graph structure and stochastic behaviour, which is even more general than a fully unknown MDP. We first translate the LTL specification into a Limit Deterministic Büchi Automaton (LDBA), which is then used in an on-the-fly product with the PL-MDP. Thereafter, we define a synchronous reward function based on the acceptance condition of the LDBA. Finally, we show that the RL algorithm delivers a policy that maximizes the satisfaction probability asymptotically. We provide experimental results that showcase the efficiency of the proposed method.
first_indexed	2024-03-06T22:15:18Z
format	Conference item
id	oxford-uuid:53301059-d6f2-49fe-9dd1-9b1b9aac944e
institution	University of Oxford
language	English
last_indexed	2024-03-06T22:15:18Z
publishDate	2020
publisher	IEEE
record_format	dspace
spelling	oxford-uuid:53301059-d6f2-49fe-9dd1-9b1b9aac944e2022-03-26T16:30:04ZReinforcement learning for temporal logic control synthesis with probabilistic satisfaction guaranteesConference itemhttp://purl.org/coar/resource_type/c_5794uuid:53301059-d6f2-49fe-9dd1-9b1b9aac944eEnglishSymplectic Elements at OxfordIEEE2020Hasanbeig, MKantaros, YAbate, AKroening, DPappas, GLee, IWe present a model-free reinforcement learning algorithm to synthesize control policies that maximize the probability of satisfying high-level control objectives given as Linear Temporal Logic (LTL) formulas. Uncertainty is considered in the workspace properties, the structure of the workspace, and the agent actions, giving rise to a Probabilistically-Labeled Markov Decision Process (PL-MDP) with unknown graph structure and stochastic behaviour, which is even more general than a fully unknown MDP. We first translate the LTL specification into a Limit Deterministic Büchi Automaton (LDBA), which is then used in an on-the-fly product with the PL-MDP. Thereafter, we define a synchronous reward function based on the acceptance condition of the LDBA. Finally, we show that the RL algorithm delivers a policy that maximizes the satisfaction probability asymptotically. We provide experimental results that showcase the efficiency of the proposed method.
spellingShingle	Hasanbeig, M Kantaros, Y Abate, A Kroening, D Pappas, G Lee, I Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees
title	Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees
title_full	Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees
title_fullStr	Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees
title_full_unstemmed	Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees
title_short	Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees
title_sort	reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees
work_keys_str_mv	AT hasanbeigm reinforcementlearningfortemporallogiccontrolsynthesiswithprobabilisticsatisfactionguarantees AT kantarosy reinforcementlearningfortemporallogiccontrolsynthesiswithprobabilisticsatisfactionguarantees AT abatea reinforcementlearningfortemporallogiccontrolsynthesiswithprobabilisticsatisfactionguarantees AT kroeningd reinforcementlearningfortemporallogiccontrolsynthesiswithprobabilisticsatisfactionguarantees AT pappasg reinforcementlearningfortemporallogiccontrolsynthesiswithprobabilisticsatisfactionguarantees AT leei reinforcementlearningfortemporallogiccontrolsynthesiswithprobabilisticsatisfactionguarantees

Reinforcement learning for temporal logic control synthesis with probabilistic satisfaction guarantees

Similar Items