Planning with hidden parameter polynomial MDPs

For many applications of Markov Decision Processes (MDPs), the transition function cannot be specified exactly. Bayes-Adaptive MDPs (BAMDPs) extend MDPs to consider transition probabilities governed by latent parameters. To act optimally in BAMDPs, one must maintain a belief distribution over the la...

Full description

Bibliographic Details
Main Authors:	Costen, C, Rigter, M, Lacerda, B, Hawes, N
Format:	Conference item
Language:	English
Published:	Association for the Advancement of Artificial Intelligence 2023

_version_	1826310443438702592
author	Costen, C Rigter, M Lacerda, B Hawes, N
author_facet	Costen, C Rigter, M Lacerda, B Hawes, N
author_sort	Costen, C
collection	OXFORD
description	For many applications of Markov Decision Processes (MDPs), the transition function cannot be specified exactly. Bayes-Adaptive MDPs (BAMDPs) extend MDPs to consider transition probabilities governed by latent parameters. To act optimally in BAMDPs, one must maintain a belief distribution over the latent parameters. Typically, this distribution is described by a set of sample (particle) MDPs, and associated weights which represent the likelihood of a sample MDP being the true underlying MDP. However, as the number of dimensions of the latent parameter space increases, the number of sample MDPs required to sufficiently represent the belief distribution grows exponentially. Thus, maintaining an accurate belief in the form of a set of sample MDPs over complex latent spaces is computationally intensive, which in turn affects the performance of planning for these models. In this paper, we propose an alternative approach for maintaining the belief over the latent parameters. We consider a class of BAMDPs where the transition probabilities can be expressed in closed form as a polynomial of the latent parameters, and outline a method to maintain a closed-form belief distribution for the latent parameters which results in an accurate belief representation. Furthermore, the closed-form representation does away with the need to tune the number of sample MDPs required to represent the belief. We evaluate two domains and empirically show that the polynomial, closed-form, belief representation results in better plans than a sampling-based belief representation.
first_indexed	2024-03-07T07:51:19Z
format	Conference item
id	oxford-uuid:c9dfe4f3-8da1-4d5a-8389-73ac13092447
institution	University of Oxford
language	English
last_indexed	2024-03-07T07:51:19Z
publishDate	2023
publisher	Association for the Advancement of Artificial Intelligence
record_format	dspace
spelling	oxford-uuid:c9dfe4f3-8da1-4d5a-8389-73ac130924472023-07-17T10:55:06ZPlanning with hidden parameter polynomial MDPsConference itemhttp://purl.org/coar/resource_type/c_5794uuid:c9dfe4f3-8da1-4d5a-8389-73ac13092447EnglishSymplectic ElementsAssociation for the Advancement of Artificial Intelligence2023Costen, CRigter, MLacerda, BHawes, NFor many applications of Markov Decision Processes (MDPs), the transition function cannot be specified exactly. Bayes-Adaptive MDPs (BAMDPs) extend MDPs to consider transition probabilities governed by latent parameters. To act optimally in BAMDPs, one must maintain a belief distribution over the latent parameters. Typically, this distribution is described by a set of sample (particle) MDPs, and associated weights which represent the likelihood of a sample MDP being the true underlying MDP. However, as the number of dimensions of the latent parameter space increases, the number of sample MDPs required to sufficiently represent the belief distribution grows exponentially. Thus, maintaining an accurate belief in the form of a set of sample MDPs over complex latent spaces is computationally intensive, which in turn affects the performance of planning for these models. In this paper, we propose an alternative approach for maintaining the belief over the latent parameters. We consider a class of BAMDPs where the transition probabilities can be expressed in closed form as a polynomial of the latent parameters, and outline a method to maintain a closed-form belief distribution for the latent parameters which results in an accurate belief representation. Furthermore, the closed-form representation does away with the need to tune the number of sample MDPs required to represent the belief. We evaluate two domains and empirically show that the polynomial, closed-form, belief representation results in better plans than a sampling-based belief representation.
spellingShingle	Costen, C Rigter, M Lacerda, B Hawes, N Planning with hidden parameter polynomial MDPs
title	Planning with hidden parameter polynomial MDPs
title_full	Planning with hidden parameter polynomial MDPs
title_fullStr	Planning with hidden parameter polynomial MDPs
title_full_unstemmed	Planning with hidden parameter polynomial MDPs
title_short	Planning with hidden parameter polynomial MDPs
title_sort	planning with hidden parameter polynomial mdps
work_keys_str_mv	AT costenc planningwithhiddenparameterpolynomialmdps AT rigterm planningwithhiddenparameterpolynomialmdps AT lacerdab planningwithhiddenparameterpolynomialmdps AT hawesn planningwithhiddenparameterpolynomialmdps

Planning with hidden parameter polynomial MDPs

Similar Items