Markov decision processes with observation costs: framework and computation with a penalty scheme

We consider Markov decision processes where the state of the chain is only given at chosen observation times and of a cost. Optimal strategies involve the optimisation of observation times as well as the subsequent action values. We consider the finite horizon and discounted infinite horizon problem...

Full description

Bibliographic Details
Main Authors: Reisinger, C, Tam, J
Format: Journal article
Language:English
Published: INFORMS 2024
_version_ 1826313115579449344
author Reisinger, C
Tam, J
author_facet Reisinger, C
Tam, J
author_sort Reisinger, C
collection OXFORD
description We consider Markov decision processes where the state of the chain is only given at chosen observation times and of a cost. Optimal strategies involve the optimisation of observation times as well as the subsequent action values. We consider the finite horizon and discounted infinite horizon problems, as well as an extension with parameter uncertainty. By including the time elapsed from observations as part of the augmented Markov system, the value function satisfies a system of quasi-variational inequalities (QVIs). Such a class of QVIs can be seen as an extension to the interconnected obstacle problem. We prove a comparison principle for this class of QVIs, which implies uniqueness of solutions to our proposed problem. Penalty methods are then utilised to obtain arbitrarily accurate solutions. Finally, we perform numerical experiments on three applications which illustrate our framework.
first_indexed 2024-04-09T03:58:28Z
format Journal article
id oxford-uuid:d1fcf32a-8202-40ec-a1c2-f17ad9336bec
institution University of Oxford
language English
last_indexed 2024-09-25T04:07:55Z
publishDate 2024
publisher INFORMS
record_format dspace
spelling oxford-uuid:d1fcf32a-8202-40ec-a1c2-f17ad9336bec2024-05-31T10:39:24ZMarkov decision processes with observation costs: framework and computation with a penalty schemeJournal articlehttp://purl.org/coar/resource_type/c_dcae04bcuuid:d1fcf32a-8202-40ec-a1c2-f17ad9336becEnglishSymplectic ElementsINFORMS2024Reisinger, CTam, JWe consider Markov decision processes where the state of the chain is only given at chosen observation times and of a cost. Optimal strategies involve the optimisation of observation times as well as the subsequent action values. We consider the finite horizon and discounted infinite horizon problems, as well as an extension with parameter uncertainty. By including the time elapsed from observations as part of the augmented Markov system, the value function satisfies a system of quasi-variational inequalities (QVIs). Such a class of QVIs can be seen as an extension to the interconnected obstacle problem. We prove a comparison principle for this class of QVIs, which implies uniqueness of solutions to our proposed problem. Penalty methods are then utilised to obtain arbitrarily accurate solutions. Finally, we perform numerical experiments on three applications which illustrate our framework.
spellingShingle Reisinger, C
Tam, J
Markov decision processes with observation costs: framework and computation with a penalty scheme
title Markov decision processes with observation costs: framework and computation with a penalty scheme
title_full Markov decision processes with observation costs: framework and computation with a penalty scheme
title_fullStr Markov decision processes with observation costs: framework and computation with a penalty scheme
title_full_unstemmed Markov decision processes with observation costs: framework and computation with a penalty scheme
title_short Markov decision processes with observation costs: framework and computation with a penalty scheme
title_sort markov decision processes with observation costs framework and computation with a penalty scheme
work_keys_str_mv AT reisingerc markovdecisionprocesseswithobservationcostsframeworkandcomputationwithapenaltyscheme
AT tamj markovdecisionprocesseswithobservationcostsframeworkandcomputationwithapenaltyscheme