A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition
The traditional synthesis problem is usually solved by constructing a system that fulfills given specifications. The system is constantly interacting with the environment and is opposed to the environment. The problem can be further regarded as solving a two-player game (the system and its environme...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
PeerJ Inc.
2022-09-01
|
Series: | PeerJ Computer Science |
Subjects: | |
Online Access: | https://peerj.com/articles/cs-1094.pdf |
_version_ | 1798036946961825792 |
---|---|
author | Wei Zhao Zhiming Liu |
author_facet | Wei Zhao Zhiming Liu |
author_sort | Wei Zhao |
collection | DOAJ |
description | The traditional synthesis problem is usually solved by constructing a system that fulfills given specifications. The system is constantly interacting with the environment and is opposed to the environment. The problem can be further regarded as solving a two-player game (the system and its environment). Meanwhile, stochastic games are often used to model reactive processes. With the development of the intelligent industry, these theories are extensively used in robot patrolling, intelligent logistics, and intelligent transportation. However, it is still challenging to find a practically feasible synthesis algorithm and generate the optimal system according to the existing research. Thus, it is desirable to design an incentive mechanism to motivate the system to fulfill given specifications. This work studies the learning-based approach for strategy synthesis of reward asynchronous probabilistic games against linear temporal logic (LTL) specifications in a probabilistic environment. An asynchronous reward mechanism is proposed to motivate players to gain maximized rewards by their positions and choose actions. Based on this mechanism, the techniques of the learning theory can be applied to transform the synthesis problem into the problem of computing the expected rewards. Then, it is proven that the reinforcement learning algorithm provides the optimal strategies that maximize the expected cumulative reward of the satisfaction of an LTL specification asymptotically. Finally, our techniques are implemented, and their effectiveness is illustrated by two case studies of robot patrolling and autonomous driving. |
first_indexed | 2024-04-11T21:19:57Z |
format | Article |
id | doaj.art-f9b8144e8b674a7388acaee0f6649063 |
institution | Directory Open Access Journal |
issn | 2376-5992 |
language | English |
last_indexed | 2024-04-11T21:19:57Z |
publishDate | 2022-09-01 |
publisher | PeerJ Inc. |
record_format | Article |
series | PeerJ Computer Science |
spelling | doaj.art-f9b8144e8b674a7388acaee0f66490632022-12-22T04:02:40ZengPeerJ Inc.PeerJ Computer Science2376-59922022-09-018e109410.7717/peerj-cs.1094A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning conditionWei Zhao0Zhiming Liu1College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, Jiangsu, ChinaSchool of Software, Northwestern Polytechnical University, Xi’an, Shaanxi, ChinaThe traditional synthesis problem is usually solved by constructing a system that fulfills given specifications. The system is constantly interacting with the environment and is opposed to the environment. The problem can be further regarded as solving a two-player game (the system and its environment). Meanwhile, stochastic games are often used to model reactive processes. With the development of the intelligent industry, these theories are extensively used in robot patrolling, intelligent logistics, and intelligent transportation. However, it is still challenging to find a practically feasible synthesis algorithm and generate the optimal system according to the existing research. Thus, it is desirable to design an incentive mechanism to motivate the system to fulfill given specifications. This work studies the learning-based approach for strategy synthesis of reward asynchronous probabilistic games against linear temporal logic (LTL) specifications in a probabilistic environment. An asynchronous reward mechanism is proposed to motivate players to gain maximized rewards by their positions and choose actions. Based on this mechanism, the techniques of the learning theory can be applied to transform the synthesis problem into the problem of computing the expected rewards. Then, it is proven that the reinforcement learning algorithm provides the optimal strategies that maximize the expected cumulative reward of the satisfaction of an LTL specification asymptotically. Finally, our techniques are implemented, and their effectiveness is illustrated by two case studies of robot patrolling and autonomous driving.https://peerj.com/articles/cs-1094.pdfStrategy synthesisReward mechanismReinforcement learningLinear temporal logicExpected cumulative reward |
spellingShingle | Wei Zhao Zhiming Liu A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition PeerJ Computer Science Strategy synthesis Reward mechanism Reinforcement learning Linear temporal logic Expected cumulative reward |
title | A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition |
title_full | A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition |
title_fullStr | A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition |
title_full_unstemmed | A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition |
title_short | A learning-based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition |
title_sort | learning based synthesis approach of reward asynchronous probabilistic games against the linear temporal logic winning condition |
topic | Strategy synthesis Reward mechanism Reinforcement learning Linear temporal logic Expected cumulative reward |
url | https://peerj.com/articles/cs-1094.pdf |
work_keys_str_mv | AT weizhao alearningbasedsynthesisapproachofrewardasynchronousprobabilisticgamesagainstthelineartemporallogicwinningcondition AT zhimingliu alearningbasedsynthesisapproachofrewardasynchronousprobabilisticgamesagainstthelineartemporallogicwinningcondition AT weizhao learningbasedsynthesisapproachofrewardasynchronousprobabilisticgamesagainstthelineartemporallogicwinningcondition AT zhimingliu learningbasedsynthesisapproachofrewardasynchronousprobabilisticgamesagainstthelineartemporallogicwinningcondition |