A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition

The traditional synthesis problem is usually solved by constructing a system that fulfills given specifications. The system is constantly interacting with the environment and is opposed to the environment. The problem can be further regarded as solving a two-player game (the system and its environme...

Full description

Bibliographic Details
Main Authors: Wei Zhao, Zhiming Liu
Format: Article
Language:English
Published: PeerJ Inc. 2022-09-01
Series:PeerJ Computer Science
Subjects:
Online Access:https://peerj.com/articles/cs-1094.pdf
_version_ 1798036946961825792
author Wei Zhao
Zhiming Liu
author_facet Wei Zhao
Zhiming Liu
author_sort Wei Zhao
collection DOAJ
description The traditional synthesis problem is usually solved by constructing a system that fulfills given specifications. The system is constantly interacting with the environment and is opposed to the environment. The problem can be further regarded as solving a two-player game (the system and its environment). Meanwhile, stochastic games are often used to model reactive processes. With the development of the intelligent industry, these theories are extensively used in robot patrolling, intelligent logistics, and intelligent transportation. However, it is still challenging to find a practically feasible synthesis algorithm and generate the optimal system according to the existing research. Thus, it is desirable to design an incentive mechanism to motivate the system to fulfill given specifications. This work studies the learning-based approach for strategy synthesis of reward asynchronous probabilistic games against linear temporal logic (LTL) specifications in a probabilistic environment. An asynchronous reward mechanism is proposed to motivate players to gain maximized rewards by their positions and choose actions. Based on this mechanism, the techniques of the learning theory can be applied to transform the synthesis problem into the problem of computing the expected rewards. Then, it is proven that the reinforcement learning algorithm provides the optimal strategies that maximize the expected cumulative reward of the satisfaction of an LTL specification asymptotically. Finally, our techniques are implemented, and their effectiveness is illustrated by two case studies of robot patrolling and autonomous driving.
first_indexed 2024-04-11T21:19:57Z
format Article
id doaj.art-f9b8144e8b674a7388acaee0f6649063
institution Directory Open Access Journal
issn 2376-5992
language English
last_indexed 2024-04-11T21:19:57Z
publishDate 2022-09-01
publisher PeerJ Inc.
record_format Article
series PeerJ Computer Science
spelling doaj.art-f9b8144e8b674a7388acaee0f66490632022-12-22T04:02:40ZengPeerJ Inc.PeerJ Computer Science2376-59922022-09-018e109410.7717/peerj-cs.1094A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning conditionWei Zhao0Zhiming Liu1College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu, ChinaSchool of Software, Northwestern Polytechnical University, Xi’an, Shaanxi, ChinaThe traditional synthesis problem is usually solved by constructing a system that fulfills given specifications. The system is constantly interacting with the environment and is opposed to the environment. The problem can be further regarded as solving a two-player game (the system and its environment). Meanwhile, stochastic games are often used to model reactive processes. With the development of the intelligent industry, these theories are extensively used in robot patrolling, intelligent logistics, and intelligent transportation. However, it is still challenging to find a practically feasible synthesis algorithm and generate the optimal system according to the existing research. Thus, it is desirable to design an incentive mechanism to motivate the system to fulfill given specifications. This work studies the learning-based approach for strategy synthesis of reward asynchronous probabilistic games against linear temporal logic (LTL) specifications in a probabilistic environment. An asynchronous reward mechanism is proposed to motivate players to gain maximized rewards by their positions and choose actions. Based on this mechanism, the techniques of the learning theory can be applied to transform the synthesis problem into the problem of computing the expected rewards. Then, it is proven that the reinforcement learning algorithm provides the optimal strategies that maximize the expected cumulative reward of the satisfaction of an LTL specification asymptotically. Finally, our techniques are implemented, and their effectiveness is illustrated by two case studies of robot patrolling and autonomous driving.https://peerj.com/articles/cs-1094.pdfStrategy synthesisReward mechanismReinforcement learningLinear temporal logicExpected cumulative reward
spellingShingle Wei Zhao
Zhiming Liu
A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition
PeerJ Computer Science
Strategy synthesis
Reward mechanism
Reinforcement learning
Linear temporal logic
Expected cumulative reward
title A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition
title_full A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition
title_fullStr A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition
title_full_unstemmed A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition
title_short A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition
title_sort learning based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition
topic Strategy synthesis
Reward mechanism
Reinforcement learning
Linear temporal logic
Expected cumulative reward
url https://peerj.com/articles/cs-1094.pdf
work_keys_str_mv AT weizhao alearningbasedsynthesisapproachofrewardasynchronousprobabilisticgamesagainstthelineartemporallogicwinningcondition
AT zhimingliu alearningbasedsynthesisapproachofrewardasynchronousprobabilisticgamesagainstthelineartemporallogicwinningcondition
AT weizhao learningbasedsynthesisapproachofrewardasynchronousprobabilisticgamesagainstthelineartemporallogicwinningcondition
AT zhimingliu learningbasedsynthesisapproachofrewardasynchronousprobabilisticgamesagainstthelineartemporallogicwinningcondition