AdamB: Decoupled Bayes by Backprop With Gaussian Scale Mixture Prior

Overfitting of neural networks to training data is one of the most significant problems in machine learning. Bayesian neural networks (BNNs) are known to be robust against overfitting owing to their ability to model parameter uncertainty. Bayes by Backprop (BBB), a simple variational inference appro...

Full description

Bibliographic Details
Main Authors:	Keigo Nishida, Makoto Taiji
Format:	Article
Language:	English
Published:	IEEE 2022-01-01
Series:	IEEE Access
Subjects:	Bayesian neural networks covariate shift decoupled weight decay deep neural networks uncertainty variational inference
Online Access:	https://ieeexplore.ieee.org/document/9874837/

_version_	1811211428358520832
author	Keigo Nishida Makoto Taiji
author_facet	Keigo Nishida Makoto Taiji
author_sort	Keigo Nishida
collection	DOAJ
description	Overfitting of neural networks to training data is one of the most significant problems in machine learning. Bayesian neural networks (BNNs) are known to be robust against overfitting owing to their ability to model parameter uncertainty. Bayes by Backprop (BBB), a simple variational inference approach that optimizes variational parameters by backpropagation, has been proposed to train BNNs. However, many studies have encountered challenges in terms of variational inference for large-scale models, such as deep learning. Thus, this study proposed Adam with decoupled Bayes by Backprop (AdamB) to stabilize the training of BNNs through the application of the Adam estimator evaluation to the gradient of the neural network. The proposed approach stabilized the noisy gradient of the BBB and mitigated excess changes in the parameters. In addition, AdamB combined with a Gaussian scale mixture as a prior distribution can suppress the intrinsic increase in variational parameters. The proposed AdamB exhibited superior stability compared to training using Adam with vanilla BBB. Further, the covariate shift benchmark using image classification tasks indicated the higher reliability of AdamB than deep ensembles in the case of noise-type covariate shifts. The considerations for stable learning of BNNs by AdamB shown in image classification tasks are expected to be important insights for application to other domains.
first_indexed	2024-04-12T05:12:52Z
format	Article
id	doaj.art-b5af2bf7c60747b2823443506a48088f
institution	Directory Open Access Journal
issn	2169-3536
language	English
last_indexed	2024-04-12T05:12:52Z
publishDate	2022-01-01
publisher	IEEE
record_format	Article
series	IEEE Access
spelling	doaj.art-b5af2bf7c60747b2823443506a48088f2022-12-22T03:46:42ZengIEEEIEEE Access2169-35362022-01-0110929599297010.1109/ACCESS.2022.32034849874837AdamB: Decoupled Bayes by Backprop With Gaussian Scale Mixture PriorKeigo Nishida0https://orcid.org/0000-0001-9262-3392Makoto Taiji1Graduate School of Frontier Biosciences, Osaka University, Suita, JapanRIKEN Center for Biosystems Dynamics Research (BDR), Suita, JapanOverfitting of neural networks to training data is one of the most significant problems in machine learning. Bayesian neural networks (BNNs) are known to be robust against overfitting owing to their ability to model parameter uncertainty. Bayes by Backprop (BBB), a simple variational inference approach that optimizes variational parameters by backpropagation, has been proposed to train BNNs. However, many studies have encountered challenges in terms of variational inference for large-scale models, such as deep learning. Thus, this study proposed Adam with decoupled Bayes by Backprop (AdamB) to stabilize the training of BNNs through the application of the Adam estimator evaluation to the gradient of the neural network. The proposed approach stabilized the noisy gradient of the BBB and mitigated excess changes in the parameters. In addition, AdamB combined with a Gaussian scale mixture as a prior distribution can suppress the intrinsic increase in variational parameters. The proposed AdamB exhibited superior stability compared to training using Adam with vanilla BBB. Further, the covariate shift benchmark using image classification tasks indicated the higher reliability of AdamB than deep ensembles in the case of noise-type covariate shifts. The considerations for stable learning of BNNs by AdamB shown in image classification tasks are expected to be important insights for application to other domains.https://ieeexplore.ieee.org/document/9874837/Bayesian neural networkscovariate shiftdecoupled weight decaydeep neural networksuncertaintyvariational inference
spellingShingle	Keigo Nishida Makoto Taiji AdamB: Decoupled Bayes by Backprop With Gaussian Scale Mixture Prior IEEE Access Bayesian neural networks covariate shift decoupled weight decay deep neural networks uncertainty variational inference
title	AdamB: Decoupled Bayes by Backprop With Gaussian Scale Mixture Prior
title_full	AdamB: Decoupled Bayes by Backprop With Gaussian Scale Mixture Prior
title_fullStr	AdamB: Decoupled Bayes by Backprop With Gaussian Scale Mixture Prior
title_full_unstemmed	AdamB: Decoupled Bayes by Backprop With Gaussian Scale Mixture Prior
title_short	AdamB: Decoupled Bayes by Backprop With Gaussian Scale Mixture Prior
title_sort	adamb decoupled bayes by backprop with gaussian scale mixture prior
topic	Bayesian neural networks covariate shift decoupled weight decay deep neural networks uncertainty variational inference
url	https://ieeexplore.ieee.org/document/9874837/
work_keys_str_mv	AT keigonishida adambdecoupledbayesbybackpropwithgaussianscalemixtureprior AT makototaiji adambdecoupledbayesbybackpropwithgaussianscalemixtureprior

AdamB: Decoupled Bayes by Backprop With Gaussian Scale Mixture Prior

Similar Items