Markov decision processes with observation costs: framework and computation with a penalty scheme

We consider Markov decision processes where the state of the chain is only given at chosen observation times and of a cost. Optimal strategies involve the optimisation of observation times as well as the subsequent action values. We consider the finite horizon and discounted infinite horizon problem...

Full description

Bibliographic Details
Main Authors:	Reisinger, C, Tam, J
Format:	Journal article
Language:	English
Published:	INFORMS 2024

_version_	1826313115579449344
author	Reisinger, C Tam, J
author_facet	Reisinger, C Tam, J
author_sort	Reisinger, C
collection	OXFORD
description	We consider Markov decision processes where the state of the chain is only given at chosen observation times and of a cost. Optimal strategies involve the optimisation of observation times as well as the subsequent action values. We consider the finite horizon and discounted infinite horizon problems, as well as an extension with parameter uncertainty. By including the time elapsed from observations as part of the augmented Markov system, the value function satisfies a system of quasi-variational inequalities (QVIs). Such a class of QVIs can be seen as an extension to the interconnected obstacle problem. We prove a comparison principle for this class of QVIs, which implies uniqueness of solutions to our proposed problem. Penalty methods are then utilised to obtain arbitrarily accurate solutions. Finally, we perform numerical experiments on three applications which illustrate our framework.
first_indexed	2024-04-09T03:58:28Z
format	Journal article
id	oxford-uuid:d1fcf32a-8202-40ec-a1c2-f17ad9336bec
institution	University of Oxford
language	English
last_indexed	2024-09-25T04:07:55Z
publishDate	2024
publisher	INFORMS
record_format	dspace
spelling	oxford-uuid:d1fcf32a-8202-40ec-a1c2-f17ad9336bec2024-05-31T10:39:24ZMarkov decision processes with observation costs: framework and computation with a penalty schemeJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:d1fcf32a-8202-40ec-a1c2-f17ad9336becEnglishSymplectic ElementsINFORMS2024Reisinger, CTam, JWe consider Markov decision processes where the state of the chain is only given at chosen observation times and of a cost. Optimal strategies involve the optimisation of observation times as well as the subsequent action values. We consider the finite horizon and discounted infinite horizon problems, as well as an extension with parameter uncertainty. By including the time elapsed from observations as part of the augmented Markov system, the value function satisfies a system of quasi-variational inequalities (QVIs). Such a class of QVIs can be seen as an extension to the interconnected obstacle problem. We prove a comparison principle for this class of QVIs, which implies uniqueness of solutions to our proposed problem. Penalty methods are then utilised to obtain arbitrarily accurate solutions. Finally, we perform numerical experiments on three applications which illustrate our framework.
spellingShingle	Reisinger, C Tam, J Markov decision processes with observation costs: framework and computation with a penalty scheme
title	Markov decision processes with observation costs: framework and computation with a penalty scheme
title_full	Markov decision processes with observation costs: framework and computation with a penalty scheme
title_fullStr	Markov decision processes with observation costs: framework and computation with a penalty scheme
title_full_unstemmed	Markov decision processes with observation costs: framework and computation with a penalty scheme
title_short	Markov decision processes with observation costs: framework and computation with a penalty scheme
title_sort	markov decision processes with observation costs framework and computation with a penalty scheme
work_keys_str_mv	AT reisingerc markovdecisionprocesseswithobservationcostsframeworkandcomputationwithapenaltyscheme AT tamj markovdecisionprocesseswithobservationcostsframeworkandcomputationwithapenaltyscheme

Markov decision processes with observation costs: framework and computation with a penalty scheme

Similar Items