Reachability in recursive Markov decision processes.

We consider a class of infinite-state Markov decision processes generated by stateless pushdown automata. This class corresponds to 1 frac(1, 2)-player games over graphs generated by BPA systems or (equivalently) 1-exit recursive state machines. An extended reachability objective is specified by two...

Full description

Bibliographic Details
Main Authors: Brázdil, T, Brozek, V, Forejt, V, Kucera, A
Format: Journal article
Language:English
Published: 2008
_version_ 1797057323337777152
author Brázdil, T
Brozek, V
Forejt, V
Kucera, A
author_facet Brázdil, T
Brozek, V
Forejt, V
Kucera, A
author_sort Brázdil, T
collection OXFORD
description We consider a class of infinite-state Markov decision processes generated by stateless pushdown automata. This class corresponds to 1 frac(1, 2)-player games over graphs generated by BPA systems or (equivalently) 1-exit recursive state machines. An extended reachability objective is specified by two sets S and T of safe and terminal stack configurations, where the membership to S and T depends just on the top-of-the-stack symbol. The question is whether there is a suitable strategy such that the probability of hitting a terminal configuration by a path leading only through safe configurations is equal to (or different from) a given x ∈ {0, 1}. We show that the qualitative extended reachability problem is decidable in polynomial time, and that the set of all configurations for which there is a winning strategy is effectively regular. More precisely, this set can be represented by a deterministic finite-state automaton with a fixed number of control states. This result is a generalization of a recent theorem by Etessami and Yannakakis which says that the qualitative termination for 1-exit RMDPs (which exactly correspond to our 1 frac(1, 2)-player BPA games) is decidable in polynomial time. Interestingly, the properties of winning strategies for the extended reachability objectives are quite different from the ones for termination, and new observations are needed to obtain the result. As an application, we derive the EXPTIME-completeness of the model-checking problem for 1 frac(1, 2)-player BPA games and qualitative PCTL formulae. © 2007 Elsevier Inc. All rights reserved.
first_indexed 2024-03-06T19:34:41Z
format Journal article
id oxford-uuid:1ea2fe51-e461-4fee-834b-97c1463db9cf
institution University of Oxford
language English
last_indexed 2024-03-06T19:34:41Z
publishDate 2008
record_format dspace
spelling oxford-uuid:1ea2fe51-e461-4fee-834b-97c1463db9cf2022-03-26T11:17:32ZReachability in recursive Markov decision processes.Journal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:1ea2fe51-e461-4fee-834b-97c1463db9cfEnglishSymplectic Elements at Oxford2008Brázdil, TBrozek, VForejt, VKucera, AWe consider a class of infinite-state Markov decision processes generated by stateless pushdown automata. This class corresponds to 1 frac(1, 2)-player games over graphs generated by BPA systems or (equivalently) 1-exit recursive state machines. An extended reachability objective is specified by two sets S and T of safe and terminal stack configurations, where the membership to S and T depends just on the top-of-the-stack symbol. The question is whether there is a suitable strategy such that the probability of hitting a terminal configuration by a path leading only through safe configurations is equal to (or different from) a given x ∈ {0, 1}. We show that the qualitative extended reachability problem is decidable in polynomial time, and that the set of all configurations for which there is a winning strategy is effectively regular. More precisely, this set can be represented by a deterministic finite-state automaton with a fixed number of control states. This result is a generalization of a recent theorem by Etessami and Yannakakis which says that the qualitative termination for 1-exit RMDPs (which exactly correspond to our 1 frac(1, 2)-player BPA games) is decidable in polynomial time. Interestingly, the properties of winning strategies for the extended reachability objectives are quite different from the ones for termination, and new observations are needed to obtain the result. As an application, we derive the EXPTIME-completeness of the model-checking problem for 1 frac(1, 2)-player BPA games and qualitative PCTL formulae. © 2007 Elsevier Inc. All rights reserved.
spellingShingle Brázdil, T
Brozek, V
Forejt, V
Kucera, A
Reachability in recursive Markov decision processes.
title Reachability in recursive Markov decision processes.
title_full Reachability in recursive Markov decision processes.
title_fullStr Reachability in recursive Markov decision processes.
title_full_unstemmed Reachability in recursive Markov decision processes.
title_short Reachability in recursive Markov decision processes.
title_sort reachability in recursive markov decision processes
work_keys_str_mv AT brazdilt reachabilityinrecursivemarkovdecisionprocesses
AT brozekv reachabilityinrecursivemarkovdecisionprocesses
AT forejtv reachabilityinrecursivemarkovdecisionprocesses
AT kuceraa reachabilityinrecursivemarkovdecisionprocesses