Importance-Weighted Variational Inference Model Estimation for Offline Bayesian Model-Based Reinforcement Learning
This paper proposes a model estimation method in offline Bayesian model-based reinforcement learning (MBRL). Learning a Bayes-adaptive Markov decision process (BAMDP) model using standard variational inference often suffers from poor predictive performance due to covariate shift between offline data...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2023-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/10368011/ |
_version_ | 1797373108146929664 |
---|---|
author | Toru Hishinuma Kei Senda |
author_facet | Toru Hishinuma Kei Senda |
author_sort | Toru Hishinuma |
collection | DOAJ |
description | This paper proposes a model estimation method in offline Bayesian model-based reinforcement learning (MBRL). Learning a Bayes-adaptive Markov decision process (BAMDP) model using standard variational inference often suffers from poor predictive performance due to covariate shift between offline data and future data distributions. To tackle this problem, this paper applies an importance-weighting technique for covariate shift to variational inference learning of a BAMDP model. Consequently, this paper uses a unified objective function to optimize both model and policy. The unified objective function can be seen as an importance-weighted variational objective function for model training. The unified objective function is also considered as the expected return for policy planning penalized by the model’s error, which is a standard objective function in MBRL. This paper proposes an algorithm optimizing the unified objective function. The proposed algorithm performs better than algorithms using standard variational inference without importance-weighting. Numerical experiments demonstrate the effectiveness of the proposed algorithm. |
first_indexed | 2024-03-08T18:45:37Z |
format | Article |
id | doaj.art-04737077dd514f7fbcf2f0292b99d110 |
institution | Directory Open Access Journal |
issn | 2169-3536 |
language | English |
last_indexed | 2024-03-08T18:45:37Z |
publishDate | 2023-01-01 |
publisher | IEEE |
record_format | Article |
series | IEEE Access |
spelling | doaj.art-04737077dd514f7fbcf2f0292b99d1102023-12-29T00:03:40ZengIEEEIEEE Access2169-35362023-01-011114557914559010.1109/ACCESS.2023.334579910368011Importance-Weighted Variational Inference Model Estimation for Offline Bayesian Model-Based Reinforcement LearningToru Hishinuma0https://orcid.org/0000-0002-6922-1595Kei Senda1https://orcid.org/0000-0001-5720-6155Department of Aeronautics and Astronautics, Kyoto University, Kyoto, JapanDepartment of Aeronautics and Astronautics, Kyoto University, Kyoto, JapanThis paper proposes a model estimation method in offline Bayesian model-based reinforcement learning (MBRL). Learning a Bayes-adaptive Markov decision process (BAMDP) model using standard variational inference often suffers from poor predictive performance due to covariate shift between offline data and future data distributions. To tackle this problem, this paper applies an importance-weighting technique for covariate shift to variational inference learning of a BAMDP model. Consequently, this paper uses a unified objective function to optimize both model and policy. The unified objective function can be seen as an importance-weighted variational objective function for model training. The unified objective function is also considered as the expected return for policy planning penalized by the model’s error, which is a standard objective function in MBRL. This paper proposes an algorithm optimizing the unified objective function. The proposed algorithm performs better than algorithms using standard variational inference without importance-weighting. Numerical experiments demonstrate the effectiveness of the proposed algorithm.https://ieeexplore.ieee.org/document/10368011/Bayesian model-based reinforcement learningdecision-aware reinforcement learningoffline reinforcement learning |
spellingShingle | Toru Hishinuma Kei Senda Importance-Weighted Variational Inference Model Estimation for Offline Bayesian Model-Based Reinforcement Learning IEEE Access Bayesian model-based reinforcement learning decision-aware reinforcement learning offline reinforcement learning |
title | Importance-Weighted Variational Inference Model Estimation for Offline Bayesian Model-Based Reinforcement Learning |
title_full | Importance-Weighted Variational Inference Model Estimation for Offline Bayesian Model-Based Reinforcement Learning |
title_fullStr | Importance-Weighted Variational Inference Model Estimation for Offline Bayesian Model-Based Reinforcement Learning |
title_full_unstemmed | Importance-Weighted Variational Inference Model Estimation for Offline Bayesian Model-Based Reinforcement Learning |
title_short | Importance-Weighted Variational Inference Model Estimation for Offline Bayesian Model-Based Reinforcement Learning |
title_sort | importance weighted variational inference model estimation for offline bayesian model based reinforcement learning |
topic | Bayesian model-based reinforcement learning decision-aware reinforcement learning offline reinforcement learning |
url | https://ieeexplore.ieee.org/document/10368011/ |
work_keys_str_mv | AT toruhishinuma importanceweightedvariationalinferencemodelestimationforofflinebayesianmodelbasedreinforcementlearning AT keisenda importanceweightedvariationalinferencemodelestimationforofflinebayesianmodelbasedreinforcementlearning |