Trading performance for stability in Markov decision processes

<p>We study controller synthesis problems for finite-state Markov decision processes, where the objective is to optimize the expected mean-payoff performance and stability (also known as variability in the literature). We argue that the basic notion of expressing the stability using the statis...

Бүрэн тодорхойлолт

Номзүйн дэлгэрэнгүй
Үндсэн зохиолчид: Brazdil, T, Chatterjee, K, Forejt, V, Kucera, A
Формат: Journal article
Хэвлэсэн: Elsevier 2016
_version_ 1826286515232178176
author Brazdil, T
Chatterjee, K
Forejt, V
Kucera, A
author_facet Brazdil, T
Chatterjee, K
Forejt, V
Kucera, A
author_sort Brazdil, T
collection OXFORD
description <p>We study controller synthesis problems for finite-state Markov decision processes, where the objective is to optimize the expected mean-payoff performance and stability (also known as variability in the literature). We argue that the basic notion of expressing the stability using the statistical variance of the mean payoff is sometimes insufficient, and propose an alternative definition.</p> <br/> <p>We show that a strategy ensuring both the expected mean payoff and the variance below given bounds requires randomization and memory, under both the above definitions. We then show that the problem of finding such a strategy can be expressed as a set of constraints.</p>
first_indexed 2024-03-07T01:44:55Z
format Journal article
id oxford-uuid:98165cbf-b07b-4de3-977c-46b3131b216b
institution University of Oxford
last_indexed 2024-03-07T01:44:55Z
publishDate 2016
publisher Elsevier
record_format dspace
spelling oxford-uuid:98165cbf-b07b-4de3-977c-46b3131b216b2022-03-27T00:04:36ZTrading performance for stability in Markov decision processesJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:98165cbf-b07b-4de3-977c-46b3131b216bSymplectic Elements at OxfordElsevier2016Brazdil, TChatterjee, KForejt, VKucera, A<p>We study controller synthesis problems for finite-state Markov decision processes, where the objective is to optimize the expected mean-payoff performance and stability (also known as variability in the literature). We argue that the basic notion of expressing the stability using the statistical variance of the mean payoff is sometimes insufficient, and propose an alternative definition.</p> <br/> <p>We show that a strategy ensuring both the expected mean payoff and the variance below given bounds requires randomization and memory, under both the above definitions. We then show that the problem of finding such a strategy can be expressed as a set of constraints.</p>
spellingShingle Brazdil, T
Chatterjee, K
Forejt, V
Kucera, A
Trading performance for stability in Markov decision processes
title Trading performance for stability in Markov decision processes
title_full Trading performance for stability in Markov decision processes
title_fullStr Trading performance for stability in Markov decision processes
title_full_unstemmed Trading performance for stability in Markov decision processes
title_short Trading performance for stability in Markov decision processes
title_sort trading performance for stability in markov decision processes
work_keys_str_mv AT brazdilt tradingperformanceforstabilityinmarkovdecisionprocesses
AT chatterjeek tradingperformanceforstabilityinmarkovdecisionprocesses
AT forejtv tradingperformanceforstabilityinmarkovdecisionprocesses
AT kuceraa tradingperformanceforstabilityinmarkovdecisionprocesses