Importance-Weighted Variational Inference Model Estimation for Offline Bayesian Model-Based Reinforcement Learning

This paper proposes a model estimation method in offline Bayesian model-based reinforcement learning (MBRL). Learning a Bayes-adaptive Markov decision process (BAMDP) model using standard variational inference often suffers from poor predictive performance due to covariate shift between offline data...

Full description

Bibliographic Details
Main Authors:	Toru Hishinuma, Kei Senda
Format:	Article
Language:	English
Published:	IEEE 2023-01-01
Series:	IEEE Access
Subjects:	Bayesian model-based reinforcement learning decision-aware reinforcement learning offline reinforcement learning
Online Access:	https://ieeexplore.ieee.org/document/10368011/

_version_	1827396085973254144
author	Toru Hishinuma Kei Senda
author_facet	Toru Hishinuma Kei Senda
author_sort	Toru Hishinuma
collection	DOAJ
description	This paper proposes a model estimation method in offline Bayesian model-based reinforcement learning (MBRL). Learning a Bayes-adaptive Markov decision process (BAMDP) model using standard variational inference often suffers from poor predictive performance due to covariate shift between offline data and future data distributions. To tackle this problem, this paper applies an importance-weighting technique for covariate shift to variational inference learning of a BAMDP model. Consequently, this paper uses a unified objective function to optimize both model and policy. The unified objective function can be seen as an importance-weighted variational objective function for model training. The unified objective function is also considered as the expected return for policy planning penalized by the model’s error, which is a standard objective function in MBRL. This paper proposes an algorithm optimizing the unified objective function. The proposed algorithm performs better than algorithms using standard variational inference without importance-weighting. Numerical experiments demonstrate the effectiveness of the proposed algorithm.
first_indexed	2024-03-08T18:45:37Z
format	Article
id	doaj.art-04737077dd514f7fbcf2f0292b99d110
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-03-08T18:45:37Z
publishDate	2023-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-04737077dd514f7fbcf2f0292b99d1102023-12-29T00:03:40ZengIEEEIEEE Access2169-35362023-01-011114557914559010.1109/ACCESS.2023.334579910368011Importance-Weighted Variational Inference Model Estimation for Offline Bayesian Model-Based Reinforcement LearningToru Hishinuma0https://orcid.org/0000-0002-6922-1595Kei Senda1https://orcid.org/0000-0001-5720-6155Department of Aeronautics and Astronautics, Kyoto University, Kyoto, JapanDepartment of Aeronautics and Astronautics, Kyoto University, Kyoto, JapanThis paper proposes a model estimation method in offline Bayesian model-based reinforcement learning (MBRL). Learning a Bayes-adaptive Markov decision process (BAMDP) model using standard variational inference often suffers from poor predictive performance due to covariate shift between offline data and future data distributions. To tackle this problem, this paper applies an importance-weighting technique for covariate shift to variational inference learning of a BAMDP model. Consequently, this paper uses a unified objective function to optimize both model and policy. The unified objective function can be seen as an importance-weighted variational objective function for model training. The unified objective function is also considered as the expected return for policy planning penalized by the model’s error, which is a standard objective function in MBRL. This paper proposes an algorithm optimizing the unified objective function. The proposed algorithm performs better than algorithms using standard variational inference without importance-weighting. Numerical experiments demonstrate the effectiveness of the proposed algorithm.https://ieeexplore.ieee.org/document/10368011/Bayesian model-based reinforcement learningdecision-aware reinforcement learningoffline reinforcement learning
spellingShingle	Toru Hishinuma Kei Senda Importance-Weighted Variational Inference Model Estimation for Offline Bayesian Model-Based Reinforcement Learning IEEE Access Bayesian model-based reinforcement learning decision-aware reinforcement learning offline reinforcement learning
title	Importance-Weighted Variational Inference Model Estimation for Offline Bayesian Model-Based Reinforcement Learning
title_full	Importance-Weighted Variational Inference Model Estimation for Offline Bayesian Model-Based Reinforcement Learning
title_fullStr	Importance-Weighted Variational Inference Model Estimation for Offline Bayesian Model-Based Reinforcement Learning
title_full_unstemmed	Importance-Weighted Variational Inference Model Estimation for Offline Bayesian Model-Based Reinforcement Learning
title_short	Importance-Weighted Variational Inference Model Estimation for Offline Bayesian Model-Based Reinforcement Learning
title_sort	importance weighted variational inference model estimation for offline bayesian model based reinforcement learning
topic	Bayesian model-based reinforcement learning decision-aware reinforcement learning offline reinforcement learning
url	https://ieeexplore.ieee.org/document/10368011/
work_keys_str_mv	AT toruhishinuma importanceweightedvariationalinferencemodelestimationforofflinebayesianmodelbasedreinforcementlearning AT keisenda importanceweightedvariationalinferencemodelestimationforofflinebayesianmodelbasedreinforcementlearning

Importance-Weighted Variational Inference Model Estimation for Offline Bayesian Model-Based Reinforcement Learning

Similar Items