Trading performance for stability in Markov decision processes

<p>We study controller synthesis problems for finite-state Markov decision processes, where the objective is to optimize the expected mean-payoff performance and stability (also known as variability in the literature). We argue that the basic notion of expressing the stability using the statis...

Бүрэн тодорхойлолт

Номзүйн дэлгэрэнгүй
Үндсэн зохиолчид:	Brazdil, T, Chatterjee, K, Forejt, V, Kucera, A
Формат:	Journal article
Хэвлэсэн:	Elsevier 2016

_version_	1826286515232178176
author	Brazdil, T Chatterjee, K Forejt, V Kucera, A
author_facet	Brazdil, T Chatterjee, K Forejt, V Kucera, A
author_sort	Brazdil, T
collection	OXFORD
description	<p>We study controller synthesis problems for finite-state Markov decision processes, where the objective is to optimize the expected mean-payoff performance and stability (also known as variability in the literature). We argue that the basic notion of expressing the stability using the statistical variance of the mean payoff is sometimes insufficient, and propose an alternative definition.</p> <br/> <p>We show that a strategy ensuring both the expected mean payoff and the variance below given bounds requires randomization and memory, under both the above definitions. We then show that the problem of finding such a strategy can be expressed as a set of constraints.</p>
first_indexed	2024-03-07T01:44:55Z
format	Journal article
id	oxford-uuid:98165cbf-b07b-4de3-977c-46b3131b216b
institution	University of Oxford
last_indexed	2024-03-07T01:44:55Z
publishDate	2016
publisher	Elsevier
record_format	dspace
spelling	oxford-uuid:98165cbf-b07b-4de3-977c-46b3131b216b2022-03-27T00:04:36ZTrading performance for stability in Markov decision processesJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:98165cbf-b07b-4de3-977c-46b3131b216bSymplectic Elements at OxfordElsevier2016Brazdil, TChatterjee, KForejt, VKucera, A<p>We study controller synthesis problems for finite-state Markov decision processes, where the objective is to optimize the expected mean-payoff performance and stability (also known as variability in the literature). We argue that the basic notion of expressing the stability using the statistical variance of the mean payoff is sometimes insufficient, and propose an alternative definition.</p> <br/> <p>We show that a strategy ensuring both the expected mean payoff and the variance below given bounds requires randomization and memory, under both the above definitions. We then show that the problem of finding such a strategy can be expressed as a set of constraints.</p>
spellingShingle	Brazdil, T Chatterjee, K Forejt, V Kucera, A Trading performance for stability in Markov decision processes
title	Trading performance for stability in Markov decision processes
title_full	Trading performance for stability in Markov decision processes
title_fullStr	Trading performance for stability in Markov decision processes
title_full_unstemmed	Trading performance for stability in Markov decision processes
title_short	Trading performance for stability in Markov decision processes
title_sort	trading performance for stability in markov decision processes
work_keys_str_mv	AT brazdilt tradingperformanceforstabilityinmarkovdecisionprocesses AT chatterjeek tradingperformanceforstabilityinmarkovdecisionprocesses AT forejtv tradingperformanceforstabilityinmarkovdecisionprocesses AT kuceraa tradingperformanceforstabilityinmarkovdecisionprocesses

Trading performance for stability in Markov decision processes

Ижил төстэй зүйлс