Trading performance for stability in Markov decision processes
<p>We study controller synthesis problems for finite-state Markov decision processes, where the objective is to optimize the expected mean-payoff performance and stability (also known as variability in the literature). We argue that the basic notion of expressing the stability using the statis...
Үндсэн зохиолчид: | , , , |
---|---|
Формат: | Journal article |
Хэвлэсэн: |
Elsevier
2016
|
_version_ | 1826286515232178176 |
---|---|
author | Brazdil, T Chatterjee, K Forejt, V Kucera, A |
author_facet | Brazdil, T Chatterjee, K Forejt, V Kucera, A |
author_sort | Brazdil, T |
collection | OXFORD |
description | <p>We study controller synthesis problems for finite-state Markov decision processes, where the objective is to optimize the expected mean-payoff performance and stability (also known as variability in the literature). We argue that the basic notion of expressing the stability using the statistical variance of the mean payoff is sometimes insufficient, and propose an alternative definition.</p> <br/> <p>We show that a strategy ensuring both the expected mean payoff and the variance below given bounds requires randomization and memory, under both the above definitions. We then show that the problem of finding such a strategy can be expressed as a set of constraints.</p> |
first_indexed | 2024-03-07T01:44:55Z |
format | Journal article |
id | oxford-uuid:98165cbf-b07b-4de3-977c-46b3131b216b |
institution | University of Oxford |
last_indexed | 2024-03-07T01:44:55Z |
publishDate | 2016 |
publisher | Elsevier |
record_format | dspace |
spelling | oxford-uuid:98165cbf-b07b-4de3-977c-46b3131b216b2022-03-27T00:04:36ZTrading performance for stability in Markov decision processesJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:98165cbf-b07b-4de3-977c-46b3131b216bSymplectic Elements at OxfordElsevier2016Brazdil, TChatterjee, KForejt, VKucera, A<p>We study controller synthesis problems for finite-state Markov decision processes, where the objective is to optimize the expected mean-payoff performance and stability (also known as variability in the literature). We argue that the basic notion of expressing the stability using the statistical variance of the mean payoff is sometimes insufficient, and propose an alternative definition.</p> <br/> <p>We show that a strategy ensuring both the expected mean payoff and the variance below given bounds requires randomization and memory, under both the above definitions. We then show that the problem of finding such a strategy can be expressed as a set of constraints.</p> |
spellingShingle | Brazdil, T Chatterjee, K Forejt, V Kucera, A Trading performance for stability in Markov decision processes |
title | Trading performance for stability in Markov decision processes |
title_full | Trading performance for stability in Markov decision processes |
title_fullStr | Trading performance for stability in Markov decision processes |
title_full_unstemmed | Trading performance for stability in Markov decision processes |
title_short | Trading performance for stability in Markov decision processes |
title_sort | trading performance for stability in markov decision processes |
work_keys_str_mv | AT brazdilt tradingperformanceforstabilityinmarkovdecisionprocesses AT chatterjeek tradingperformanceforstabilityinmarkovdecisionprocesses AT forejtv tradingperformanceforstabilityinmarkovdecisionprocesses AT kuceraa tradingperformanceforstabilityinmarkovdecisionprocesses |