A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition

The traditional synthesis problem is usually solved by constructing a system that fulfills given specifications. The system is constantly interacting with the environment and is opposed to the environment. The problem can be further regarded as solving a two-player game (the system and its environme...

Full description

Bibliographic Details
Main Authors:	Wei Zhao, Zhiming Liu
Format:	Article
Language:	English
Published:	PeerJ Inc. 2022-09-01
Series:	PeerJ Computer Science
Subjects:	Strategy synthesis Reward mechanism Reinforcement learning Linear temporal logic Expected cumulative reward
Online Access:	https://peerj.com/articles/cs-1094.pdf

_version_	1798036946961825792
author	Wei Zhao Zhiming Liu
author_facet	Wei Zhao Zhiming Liu
author_sort	Wei Zhao
collection	DOAJ
description	The traditional synthesis problem is usually solved by constructing a system that fulfills given specifications. The system is constantly interacting with the environment and is opposed to the environment. The problem can be further regarded as solving a two-player game (the system and its environment). Meanwhile, stochastic games are often used to model reactive processes. With the development of the intelligent industry, these theories are extensively used in robot patrolling, intelligent logistics, and intelligent transportation. However, it is still challenging to find a practically feasible synthesis algorithm and generate the optimal system according to the existing research. Thus, it is desirable to design an incentive mechanism to motivate the system to fulfill given specifications. This work studies the learning-based approach for strategy synthesis of reward asynchronous probabilistic games against linear temporal logic (LTL) specifications in a probabilistic environment. An asynchronous reward mechanism is proposed to motivate players to gain maximized rewards by their positions and choose actions. Based on this mechanism, the techniques of the learning theory can be applied to transform the synthesis problem into the problem of computing the expected rewards. Then, it is proven that the reinforcement learning algorithm provides the optimal strategies that maximize the expected cumulative reward of the satisfaction of an LTL specification asymptotically. Finally, our techniques are implemented, and their effectiveness is illustrated by two case studies of robot patrolling and autonomous driving.
first_indexed	2024-04-11T21:19:57Z
format	Article
id	doaj.art-f9b8144e8b674a7388acaee0f6649063
institution	Directory Open Access Journal
issn	2376-5992
language	English
last_indexed	2024-04-11T21:19:57Z
publishDate	2022-09-01
publisher	PeerJ Inc.
record_format	Article
series	PeerJ Computer Science
spelling	doaj.art-f9b8144e8b674a7388acaee0f66490632022-12-22T04:02:40ZengPeerJ Inc.PeerJ Computer Science2376-59922022-09-018e109410.7717/peerj-cs.1094A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning conditionWei Zhao0Zhiming Liu1College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu, ChinaSchool of Software, Northwestern Polytechnical University, Xi’an, Shaanxi, ChinaThe traditional synthesis problem is usually solved by constructing a system that fulfills given specifications. The system is constantly interacting with the environment and is opposed to the environment. The problem can be further regarded as solving a two-player game (the system and its environment). Meanwhile, stochastic games are often used to model reactive processes. With the development of the intelligent industry, these theories are extensively used in robot patrolling, intelligent logistics, and intelligent transportation. However, it is still challenging to find a practically feasible synthesis algorithm and generate the optimal system according to the existing research. Thus, it is desirable to design an incentive mechanism to motivate the system to fulfill given specifications. This work studies the learning-based approach for strategy synthesis of reward asynchronous probabilistic games against linear temporal logic (LTL) specifications in a probabilistic environment. An asynchronous reward mechanism is proposed to motivate players to gain maximized rewards by their positions and choose actions. Based on this mechanism, the techniques of the learning theory can be applied to transform the synthesis problem into the problem of computing the expected rewards. Then, it is proven that the reinforcement learning algorithm provides the optimal strategies that maximize the expected cumulative reward of the satisfaction of an LTL specification asymptotically. Finally, our techniques are implemented, and their effectiveness is illustrated by two case studies of robot patrolling and autonomous driving.https://peerj.com/articles/cs-1094.pdfStrategy synthesisReward mechanismReinforcement learningLinear temporal logicExpected cumulative reward
spellingShingle	Wei Zhao Zhiming Liu A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition PeerJ Computer Science Strategy synthesis Reward mechanism Reinforcement learning Linear temporal logic Expected cumulative reward
title	A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition
title_full	A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition
title_fullStr	A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition
title_full_unstemmed	A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition
title_short	A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition
title_sort	learning based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition
topic	Strategy synthesis Reward mechanism Reinforcement learning Linear temporal logic Expected cumulative reward
url	https://peerj.com/articles/cs-1094.pdf
work_keys_str_mv	AT weizhao alearningbasedsynthesisapproachofrewardasynchronousprobabilisticgamesagainstthelineartemporallogicwinningcondition AT zhimingliu alearningbasedsynthesisapproachofrewardasynchronousprobabilisticgamesagainstthelineartemporallogicwinningcondition AT weizhao learningbasedsynthesisapproachofrewardasynchronousprobabilisticgamesagainstthelineartemporallogicwinningcondition AT zhimingliu learningbasedsynthesisapproachofrewardasynchronousprobabilisticgamesagainstthelineartemporallogicwinningcondition

A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition

Similar Items